natcap / invest

InVEST®: models that map and value the goods and services from nature that sustain and fulfill human life.
Apache License 2.0
171 stars 71 forks source link

Should models have tighter raster input type contracts? #1085

Open dcdenu4 opened 2 years ago

dcdenu4 commented 2 years ago

I've noticed recently on the forums and from @newtpatrol that we often indicate model raster inputs should be of a certain type but we don't ultimately enforce / validate that upfront and usually the models can handle a floating point raster when we ask for an integer raster. I'm starting to wonder if being relaxed on this is actually more beneficial to the user and to us.

Should we have a more strict contract with raster input types?

Benefits:

Drawbacks:

newtpatrol commented 2 years ago

What I think is most important is that whatever the requirements are / whatever is actually supported should be clearly stated in the User Guide (and stated directly in the model input description, not something they need to click away to find, since most people probably won't do that). It is not generally more work for the user to adhere to something like requiring int for LULC classes etc (maybe a tiny bit to cast from float to int, which is not a typical thing to need to do and not at all a big deal). If we make it clear what's required, it's easier for the user to adhere to it, and easier for us to support it.

davemfish commented 2 years ago

I think the costs outweigh the benefits for allowing a user to supply a float type raster for something categorical like LULC or soil type.

I guess the beneficial case is where the floats are all cast-able to int without any rounding (e.g. 1.0, 2.0, 3.0). But another common case would be float values that were created due to a poorly chosen interpolation method or something like that. In which case we really can't make use of the data. And guarding against that is a cost.

I agree with Stacie about clearly describing the requirement. I think input validation would meet that objective, since the feedback would go straight to the UI before the model even runs.