ua-snap / rasdaman-ingest

Collection of ingredients/configurations + docs for ingesting data into Rasdaman
MIT License
3 stars 0 forks source link

ALFRESCO veg type and flammability datasets SNAP portal prep #63

Closed kyleredilla closed 1 year ago

kyleredilla commented 1 year ago

This PR adds a single Jupyter notebook (iem/alfresco/portal_prep.ipynb) for extracting metadata from and zipping the recent (2022) summarized ALFRESCO data for hosting via the SNAP GeoNetwork data portal:

This also includes a zipit.sh script for helping to zip the data files, which is executed from the notebook. To review, feel free to just have a look without running, or run the whole notebook. Any feedback is welcome! No rush on this.

kyleredilla commented 1 year ago

I ran this notebook successfully and it produced the expected output. 👍 Nice work!

I found one small mention of "degree day variable" that I think may have been a copy & paste typo, and made a suggestion for that.

Fixed!

Also, while running the notebook, the "WSEN bounds" output looked like this:

WSEN bounds: [-3.59532e+02  5.03484e+01 -5.90000e-02  7.29320e+01]

Um, something is up here, those values for the West and East bounds don't look correct. Also, they aren't rounded, which is odd because I did have code for explicitly rounding which I thought should negate the need to suppress scientific notation. I'd like to look into that some more. Could you send me an export of your conda env you used when you get a chance?

cstephen commented 1 year ago

I tried running the portal_prep.ipynb notebook using a couple of different conda environments with the same result (bounds in scientific notation). I've exported the most recent conda environment that produced this result to a file here on Atlas:

/atlas_scratch/crstephenson/pr63_export.yml

This conda environment is based on the one from this repo (ingest_env.yml) but with jupyter and rasterio installed also.

I was too quick to call the scientific notation numbers correct, just that they appeared to be valid for a bounding box, although even for a bounding box -359.532 might be a bit strange.

Thanks for taking a look at this, @kyleredilla !

kyleredilla commented 1 year ago

Yep it's looking like there was a bug fixed in rasterio.crs or one of its dependencies along the way between the version that is spec'd in the env for this repo and the env that I have been using. I need to get the snap-geo env finalized and backtested with this repo so that we can all be consistent..

Here is the issue, if you're curious: changing one of the parameters of the Well Known Text string was the hack to get the reprojection to a horizontally rotated WGS84 grid. With the old version of rasterio, used in the env of this repo:

>>> dst_crs = CRS.from_wkt(CRS.from_epsg(4326).to_wkt().replace('PRIMEM["Greenwich",0', 'PRIMEM["Greenwich",180'))
>>> print(transform(src_crs, dst_crs, [src_bounds.left], [src_bounds.bottom]))
([-179.0248958937951], [51.04609435844049])
>>> dst_crs = CRS.from_epsg(4326)
>>> print(transform(src_crs, dst_crs, [src_bounds.left], [src_bounds.bottom]))
([-179.0248958937951], [51.04609435844049])

And with the more recent version currently featured in the snap-geo env:

>>> dst_crs = CRS.from_wkt(CRS.from_epsg(4326).to_wkt().replace('PRIMEM["Greenwich",0', 'PRIMEM["Greenwich",180'))
>>> print(transform(src_crs, dst_crs, [src_bounds.left], [src_bounds.bottom]))
([0.9751041062048982], [51.04609435844049])
>>> dst_crs = CRS.from_epsg(4326)
>>> print(transform(src_crs, dst_crs, [src_bounds.left], [src_bounds.bottom]))
([-179.0248958937951], [51.04609435844049])

I guess I'm not quite sure what the best path forward is yet.

kyleredilla commented 1 year ago

Alright @cstephen sorry for the long delay on this! The issue with scientific notation only occurs because of the faulty rasterio version (mentioned above) giving very small numbers for bounding box. This is resolved with using the snap-geo env which has rasterio v1.2.10, and which is suggested in the README for processing work. I did make one change in this notebook to move away from writing outputs directly to permanent storage, and towards the "write to user defined location, then copy to storage" philosphy. If you can have a look eventually, that would be great!