lulc rasters must be saved as integer type in addition to holding integer values

jsta commented 3 years ago

I am running the NDR model and I recently had some difficulty because it seems that in addition to values in the lulc raster needing to be integers, the raster itself must be saved as an integer datatype (Type=Int32 not Type=Float32).

Using the example data, opening the lulc raster in R, resaving it (which creates a file containing integer values but with a float datatype by default), and running the following model run script:

Model run script

```python from natcap.invest.ndr import ndr # help(ndr.execute) data_dir = "data/NDR/" args = { "workspace_dir": "workspace", "dem_path": data_dir + "DEM_gura.tif", "lulc_path": data_dir + "land_use_gura.tif", "runoff_proxy_path": data_dir + "precipitation_gura.tif", "watersheds_path": data_dir + "watershed_gura.shp", "biophysical_table_path": data_dir + "biophysical_table_gura.csv", "calc_p": True, "calc_n": False, "threshold_flow_accumulation": 1000, "k_param": 2, "subsurface_eff_p": 0, # not used for p model "subsurface_critical_length_p": 0 # not used for p model } ndr.execute(args) ```

Throws the following error:

KeyError: 'lucode: -339999995214436424907732413799364296704 is present in the landuse raster but missing from the biophysical table'

Perhaps the datatype of the lulc raster could be checked during runtime and throw a more informative error? Alternatively maybe this info could be added to the user guide.

phargogh commented 3 years ago

Hi @jsta ,

While InVEST could indeed check for the datatype of certain raster inputs that do have a required data type, it isn't actually a strict requirement for reclassification. You can have a float32 raster with integer values and have reclassification work exactly as expected. Using the sample data for NDR as an example, if I convert the NDR landcover raster to a float32 type and run the model with the resulting landcover raster, the model works as intended. Here's how I translated the datatype:

$  gdal_translate -ot Float32 data/invest-sample-data/NDR/land_use_gura.tif land_use_gura_ndr_float32.tif

The specific error you mention makes me think that the R package you're using is manipulating pixel values when the raster is written out, and all the error is saying is that there exist one or more pixels with a value of -339999995214436424907732413799364296704 (a perfectly valid landcover code, by the way!) that isn't accounted for in your biophysical table. The model won't reclassify pixel values that match the nodata value, so maybe the R package is translating nodata pixel values to this new code, but then not changing the nodata value of the raster itself?

And as a related question, could you describe a little more about why the error message provided isn't informative? We'd like very much for the error to be as helpful as possible, so if there's a way that we can have the text of the exception better reflect what's going on, then of course we'd like to make that happen.

jsta commented 3 years ago

Looks like you are correct and R is writing that value to nodata cells. Still, a very difficult to bug to track down given that the -3x10.... value is not shown prior to file writing. According to R both the int and float versions have a nodata value of -Inf.

phargogh commented 3 years ago

Still, a very difficult to bug to track down given that the -3x10.... value is not shown prior to file writing. According to R both the int and float versions have a nodata value of -Inf.

I'm with you there! I'm glad we were able to iron out where the failure is, but still, the R package probably shouldn't be changing things implicitly. -inf in an integer raster is definitely not the same as -inf in float32 and should probably not be silently translated.

natcap / invest

lulc rasters must be saved as integer type in addition to holding integer values #359