Open jgomezdans opened 6 years ago
Just to be clear, Is it currently the case that KaFKA determines the state grid from the state mask and then re-samples everything else to match that?
So you currently define the spatial grid by specifying the state mask?
Yeah, it's there in the code. I have used state mask and state grid interchangeably. This is for hi-res, coarse res is not something that we're considering here yet (and there's less need for it, as we don't have spatially-explicit priors and the idea was to process whole tiles so state mask is only there for convenience in testing)
So, currently the state mask is required by the inference engine? I think it should be optional. In many cases, users will just want their roi. A 2D numpy array full of True's is simply not necessary then. As we are likely to run into memory problems we should try to save where we can.
Possibly having it optional is an... option ;) However, the state needs to be tied up to the observations, prior and so on. The mask is also used heavily internally in the inference engine. I guess that my idea was to just pass an (eg) GeoTIFF with all the ROI information in. Reading the array in when needed isn't a big deal, as we're spatially constrained by the sparse matrix underlying C/Fortran code (I haven't been able to work on spatial regions > ~1000x1000 pxls due to memory errors in the underlying libraries that scipy uses).
So my preferred solution would be to have the GeoTIFF (it defines spatial resolution which is important). That can be provided invisibly for the user using the rasterise code I PR'ed yesterday (might need to add some spatial resolution on that).
We actually can't work on spatial regions of more than a million pixels? That's good to know, though with a tiling approach we would have aimed for smaller pixle sizes anyway. I think that for the state mask it would also work to simply pass in the roi information (including spatial resolution) as parameter and create the numpy array from that. The alternative would then be the GeoTIFF or Shapefile or whatever we choose to accept. The other data could then be resampled to the grid.
@TonioF asked...
The state mask is just a numpy array with True/False for different pixels. However, because different observations and prior data can come with different projections etc, it is important that the geographical reference is defined too. The easiest way is to store the state mask in a GDAL-compatible dataset. If the user wants to use a vector file, I have added a rasterise function to core to this end.