Open natjms opened 1 week ago
Thanks for looking into and documenting this. We (AirFire) don't use fccs_canada.nc
(it was added by Miles a couple of years ago for use by UBC), so we're probably not going to spend time in the near future specifically getting it working. However, we should take a look at the potential issue in our use of GeoDataFrame.to_crs
. Thanks for pointing that out.
Recently I've started working on integrating another dataset into our fork of fccsmap (also LCC) and was surprised to see that it actually worked flawlessly first try with no modifications. At this rate it seems to me that it's more likely this isn't a problem with geopandas and rasterio, and is in fact another hidden issue with the fccs_canada.nc
data set
Some useful context to this issue as I'm going to describe it is that the
fccs_canada.nc
data set is very broken. The short of it is that while it does indeed have a geotransform described in its metadata, for one reason or another, GDAL won't recognize it. You can force it to recognize it by essentially reading the affine transformation matrix in some other program and then writing the dataset to a new file, passing the geotransform manually:Not that that really matters, because that transform maps points about 30 degrees east of where they should be. I decided to georeference the dataset from scratch to deal with this. If you'd like to include my rework of the
fccs_canada.nc
dataset, you're welcome to download it here. But if you were to load the dataset into fccsmap manually, you'd have this problem:I've come to assume that this error occurs when a provided point lands outside the domain. That seems to reliably be the case for
fccs_fuelload.nc
, anyway. If you flip around the coordinates, as you'd need to do for the LCC projection this data set uses, you get this:This error looks like it's a result of projection issues. Specifically, you can get these ridiculous attempted allocations when two data sets use different projections, and rasterio tries to allocate enough space to cover both of them. That takes us to this method in
fccsmap/baselookup.py
:This is where I started thinking this might actually be an issue with fccsmap and not just me trying to cram a square into a circular hole. This method assumes the user provides input using the CRS EPSG:4326, and then attempts to translate it to the particular CRS of the dataset. The crux of this issue seems to be that
GeoDataFrame.to_crs
does not actually reproject the data frame. I'm guessing it does in a sense, but not in a way recognized by rasterio and GDAL when runningzonal_stats
. If I'm understanding this correctly, it makes sense that this wouldn't be a problem forfccs_fuelload.nc
because it uses the same projection, or at least one that's very similar(?)GeoPandas has a page on how to reproject a GeoDataFrame using GDAL which would be slower, but you'd think it'd do it in a way that GDAL would subsequently recognize:
I wasn't able to get this working, unfortunately. Running this modified method throws "invalid latitude" whenever you run
look_up
, on bothfccs_fuelload.nc
andbetter_fccs_canada.nc
.All that being said, if fccsmap is in fact trying to accept coordinates in EPSG:4326 and then map it to the corresponding data set crs, it's not clear why I had to flip the coordinates around earlier. I'd expect it to have failed in the same way without me having to flip them around. I don't have a good answer for why that is.
I was hoping to find a solution that works for all the data sets included in fccsmap, but so far it's eluded me. I have a working patch that lets us access
fccs_canada.nc
through fccsmap but basically breaks it for the others. We might use that internally at the WFRT, but at this rate it looks like we're going to to take a different path entirely to do fuel bed lookup in Canada. If I do have the time to revisit this and come up with an elegant solution, I'll submit it as a PR.I don't really expect AirFire to provide support for the Canadian aspect of fccsmap, but since this incredibly long journey lead me to something that could hypothetically be an issue in the future if you folks introduce more data sets, I thought I'd take the time to document it here. Not to mention, I didn't want to leave this investigation as a complete red herring :)