icesat2py / icepyx

Python tools for obtaining and working with ICESat-2 data
https://icepyx.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
207 stars 107 forks source link

spatial_subsetting_vis.ipynb #44

Closed arrran closed 4 years ago

arrran commented 4 years ago

Hi there, Im trying to get elevation data for a small polygon over all time available.

I've followed through spatial_subsetting_vis.ipynb and i'm wondering how you got to In [14] temp_gdf.head()

ATL08_to_dict() gives me an empty list, not sure if that's bugs or my data. I just copied the dataset_dict from the tutorial.

Thanks very much! v useful module and notebooks

JessicaS11 commented 4 years ago

Hello @arrran! Thanks for getting in touch. Could you please provide a some more information on the inputs you're using so I can recreate the problem you are having? For example, the exact commands you are running (including their inputs) that are giving you an empty output. gda_lib.ATL08_to_dict() takes two inputs, a list of files and a dictionary of dataset variable names/paths, but you only mentioned which values you used for the latter. In addition, the .head() method must be called on a dataframe (in this case, we're using a geodataframe), but gda_lib.ATL08_to_dict() will return a dictionary, not a dataframe.

Thanks for using icepyx - we'd love to have you join the team and learn more about your examples/use cases!

arrran commented 4 years ago

Hi Jessica, Thanks for the help!

I'm following your spatial_subsetting_vis.ipynb notebook, the only difference is I am using a different polygon (points attached in txt file), and I am not using a subregion.

If you look at your spatial_subsetting_vis.ipynb notebook on this github it already has the same error message. Scroll down to Out [31] UnboundLocalError: local variable 'df_final' referenced before assignment

-I was wondering how despite that error you managed to jump to In [14] which has the data in a geodataframe, did you load it in a command which later got deleted?

The Out [31] error is solved by adding adding df_final = [] after line 131 of gda_lib.py With that fix, I still couldn't load to geodataframe as data_dict = ATL08_to_dict(ATL06_fn,dataset_dict) is producing an empty list.

study_area_buffer_points.txt

Thanks very much for the help!! arran

arrran commented 4 years ago

Attached here is the code I followed in case my description doesnt make sense. Its just the code from your jupter notebook spatial_subsetting_vis.ipynb, except I have copied in the functions that I was fiddling with.

Cheers!

JessicaS11 commented 4 years ago

Hello Arran, Of course! I think I see what the problem is (and why my initial answer may not have been super helpful). When you referenced the filename and commands you were getting errors on, I immediately looked where those functions are called in the relevant example (icepyx/doc/examples/ICESat-2_DEM_comparison_Colombia_working.ipynb). I failed to notice that you were referencing one of the dev-notebooks, which are not meant to be standalone/runnable examples. Rather, the dev-notebook directory is where we put our experimental notebooks as we're creating them and experimenting with and debugging the code. It sounds like we should do a better job hiding them in the directory structure!

That said, I looked into what's going on. From looking at the error message in the dev-notebook the error is actually being produced in the ATL08_to_dict function of topolib, and the problem arises when an earlier step is erroneously producing an empty dataframe (which is why adding df_final=[] removed that error but didn't solve the underlying issue). From looking farther up in the code you shared, it looks like you are searching and downloading ATL06 data, but then trying to extract ATL08 parameters. My guess is that the ATL06 data doesn't have the data you are trying to extract using the dataset_dict variable, which is for ATL08. Thus, an empty dataframe is returned and you ultimately run into the errors you've encountered.

As for solutions, assuming that's the issue, your next step will depend on which dataset you'd actually like to use. If it's ATL08, you will need to edit your icepyx object to order ATL08 instead of ATL06. If it's ATL06, you'll have to figure out which variables and paths you'd like to extract into your dataframe and update the dataset_dict accordingly. We're in the process of working on some major code changes to make it easier to use and provide some default variable lists for each dataset, but we haven't finished compiling those yet. We'd love to have your input as a data user!

arrran commented 4 years ago

Okay great! yeah that makes sense, whoops as well as looking at the wrong notebooks i had 08/06 mixed.

I am using ATL06, just for the higher res (though I dont really know much)

I just need to set the right variables in the dictionary, the goal is to get the data I have downloaded into a geodataframe. Are there any notebooks which do this for ALT06? I can see the huge ATL06-data-dictionary-v001 but it would probably be easier to follow someone who know what they were doing if it's available.

Thanks very much for the help,

JessicaS11 commented 4 years ago

I'm glad to hear you were able to sort out the errors!

Currently there's no collated resource I can point you towards for ATL06, though the information does exist. Some potential places to look:

I hope this helps - we'd love for you to contribute whatever you come up with to icepyx!

arrran commented 4 years ago

Ah yep great thanks, and yes keen to contribute anything i write!

One thing I haven't figured out: In the notebooks why do you often define a bounding box or boundary shapefile then turn subset off (e.g. here

Also with subset off, it doesn't seem to download all icesat2 data. What exactly is the subset?

JessicaS11 commented 4 years ago

Awesome - thanks, @arrran!

You've hit on a question that gets asked of the NSIDC a lot. If you take a look at the data access tutorial notebook, it outlines some of these steps in more detail. The gist of it is that there are actually several stages to data access. First is a simple query that runs quickly based on metadata (CMR format in this case) stored about each granule (without ever looking at any of the actual data files). Next is the actual data order step, which contacts the NSIDC and asks them to make sure the data you want is ready for download. This can include subsetting, which I'll explain more in a second. Third, you actually download the data that's been prepared for your order.

You're probably familiar with subsetting from your own work - we're used to downloading geospatial data as files containing large granules or scenes. Then, we have to extract the actual area (or time period or variables) we want from each file locally. For ICESat-2 data, NSIDC has a subsetting service which will do this as part of the ordering process, eliminating this step from your later workflow and significantly decreasing your file size (which is obviously helpful for both download times and storage). The subsetting service can do spatial, temporal, and variable subsetting as well as a few other types of file conversions - I'd highly recommend taking advantage of it if you don't need full granules! Sometimes, this means that you might have x granules returned by the query but x-y (where y>0) returned in the order. This is because the metadata used to search for the granules isn't perfect, so sometimes when the subsetter starts it determines that there's not actually any data that meet your search criteria within a given granule (I'm sure you've seen this phenomenon happening in online data visualization centers, where a tiny corner of an image is in your search area so the whole granule/scene is given as a result, even though most of it is outside your area of interest).