multiply-org / multiply-core

The core functionality of the MULTIPLY platform
2 stars 5 forks source link

Rasterising state mask from vector file #8

Open jgomezdans opened 6 years ago

jgomezdans commented 6 years ago

@TonioF asked...

When you say that you resample it to the state mask, is that because you assume the state mask has the spatial extent and resolution specified by the user? I am asking because it might happen that we have to resample the state mask, too, doesn't it? (if it is provided as vector data, we first have to bring it to a grid, so we can take the requested one right away, of course)

The state mask is just a numpy array with True/False for different pixels. However, because different observations and prior data can come with different projections etc, it is important that the geographical reference is defined too. The easiest way is to store the state mask in a GDAL-compatible dataset. If the user wants to use a vector file, I have added a rasterise function to core to this end.

NPounder commented 6 years ago

Just to be clear, Is it currently the case that KaFKA determines the state grid from the state mask and then re-samples everything else to match that?

So you currently define the spatial grid by specifying the state mask?

jgomezdans commented 6 years ago

Yeah, it's there in the code. I have used state mask and state grid interchangeably. This is for hi-res, coarse res is not something that we're considering here yet (and there's less need for it, as we don't have spatially-explicit priors and the idea was to process whole tiles so state mask is only there for convenience in testing)

TonioF commented 6 years ago

So, currently the state mask is required by the inference engine? I think it should be optional. In many cases, users will just want their roi. A 2D numpy array full of True's is simply not necessary then. As we are likely to run into memory problems we should try to save where we can.

jgomezdans commented 6 years ago

Possibly having it optional is an... option ;) However, the state needs to be tied up to the observations, prior and so on. The mask is also used heavily internally in the inference engine. I guess that my idea was to just pass an (eg) GeoTIFF with all the ROI information in. Reading the array in when needed isn't a big deal, as we're spatially constrained by the sparse matrix underlying C/Fortran code (I haven't been able to work on spatial regions > ~1000x1000 pxls due to memory errors in the underlying libraries that scipy uses).

So my preferred solution would be to have the GeoTIFF (it defines spatial resolution which is important). That can be provided invisibly for the user using the rasterise code I PR'ed yesterday (might need to add some spatial resolution on that).

TonioF commented 6 years ago

We actually can't work on spatial regions of more than a million pixels? That's good to know, though with a tiling approach we would have aimed for smaller pixle sizes anyway. I think that for the state mask it would also work to simply pass in the roi information (including spatial resolution) as parameter and create the numpy array from that. The alternative would then be the GeoTIFF or Shapefile or whatever we choose to accept. The other data could then be resampled to the grid.