isciences / exactextract

Fast and accurate raster zonal statistics
Apache License 2.0
246 stars 32 forks source link

'variety' metric only works when its added first from the command line #48

Closed trcull closed 11 months ago

trcull commented 11 months ago

I've encountered a phenomena where the 'variety' metric (and minority, majority, etc) only works if it's the first command line argument but doesn't work if, say, the 'mean' is the first command line argument. My c++ is very rusty but I assume it's because of whatever's going on here in stats_registry.h:

class StatsRegistry { public: RasterStats &stats(const std::string &feature, const Operation &op) { // TODO come up with a better storage method. auto& stats_for_feature = m_feature_stats[feature];

        // can't use find because this requires RasterStats to be copy-constructible before C++ 17
        auto exists = stats_for_feature.count(op_key(op));
        if (!exists) {
            // can't use emplace because this requires RasterStats be copy-constructible before C++17
            RasterStats<double> new_stats(requires_stored_values(op.stat));
            stats_for_feature[op_key(op)] = std::move(new_stats);
        }

        return stats_for_feature[op_key(op)];
    }

If I'm understanding that code correctly, a new instance of RasterStats replaces an old instance. The problem is that some instances are initialized with "requires_stored_values" as true and some as false. Specifically, variety is true and mean is false. I suspect that if variety happens to be first in the command line, then it somehow "wins" but if its last then it loses and we end up with a RasterStats instance that doesn't store values. The result is that all variety results end up being zero in the output and all minority/majority results end up being nan.

Specifically, this gives me the right result: exactextract -r the_metric:/tmp/nlcd_2021_land_cover_l48_20230630_projected.tif[1] -p /tmp/h3_zoom4_part.1040.csv -f h3_index -s variety(the_metric) -s mean(the_metric) -s min(the_metric) -s max(the_metric) -s count(the_metric) -s sum(the_metric) -s minority(the_metric) -s majority(the_metric) -s variance(the_metric) -s stdev(the_metric) -s coefficient_of_variation(the_metric) -o /tmp/nlcd_2021_land_cover_l48_20230630_projected.tif.1.0.csv

This gives me the wrong result: exactextract -r the_metric:/tmp/nlcd_2021_land_cover_l48_20230630_projected.tif[1] -p /tmp/h3_zoom4_part.1040.csv -f h3_index -s mean(the_metric) -s min(the_metric) -s max(the_metric) -s count(the_metric) -s sum(the_metric) -s minority(the_metric) -s majority(the_metric) -s variety(the_metric) -s variance(the_metric) -s stdev(the_metric) -s coefficient_of_variation(the_metric) -o /tmp/nlcd_2021_land_cover_l48_20230630_projected.tif.1.0.csv

The only thing that changed between those two was where I put "variety(the_metric)" in the ordering of command line arguments.

fwiw, if you can't already tell, I'm actually only using exactextractr for the command line version of exactextract inside it, which appears to be kept more up to date. In the original exactextract project, variety/majority/minority don't seem to work at all, regardless of command line order. Searching the code in the other project, it looks like nothing appears to actually use the requires_stored_values function at all, which I assume is the problem.

dbaston commented 11 months ago

Thanks for the report. This is fixed in 5f99b1562d6b2e671b707f361b6fdeb689d4a3ac.

I'm actually only using exactextractr for the command line version of exactextract inside it, which appears to be kept more up to date

The code is maintained/developed in this repository and updated in the R package only when needed.

trcull commented 6 months ago

thank you!