Open emanueleromito opened 9 months ago
Thanks for the report!
I think it'd be awesome if we provided an easily-accessible escape hatch to view the raw CMR results that earthaccess queried for situations like this where our assumptions don't line up with end-users' use cases. Without the escape hatch, users have to wait to use earthaccess until we adapt to support their use case. With the escape hatch, they can begin using earthaccess with a minor "hack" and later on remove it when we support their use case fully.
What do you all think? I don't think we currently support this, but maybe we do, I just didn't find it in the docs and am not planning on source diving today :)
I'm thinking the implementation might be DataGranule
having a .raw
or .cmr_json
attribute/property that contains the parsed JSON from CMR for that granule. Same for collections!
Hi @emanueleromito,
All that information is still available in the results, earthaccess
is only accessing part of it. To get to the XML companion files we can do something like this:
import earthaccess
earthaccess.login()
results = earthaccess.search_data(
short_name="MCD12Q1",
count=10
)
for granule in results:
print(granule["umm"]["RelatedUrls"])
all the granules have a "meta" and a "umm" dictionaries with all the data we need. If you want to filter only those XML and hdf we can download them with:
links = []
for granule in results:
urls = [link["URL"] for link in granule["umm"]["RelatedUrls"] if (link["URL"].endswith((".xml", ".hdf")) and link["URL"].startswith("https"))]
links.extend(urls)
earthaccess.download(links, "./MCD12Q1")
and that's it, let us know if this works for you.
all the granules have a "meta" and a "umm" dictionaries with all the data we need.
Awesome! This does appear to be undocumented. Or perhaps a limitation of search. I'm thinking we could use a how-to on this. Or perhaps we should expose those as properties that will be picked up by our API autodoc setup? Or both. #368
I'm currently using the earthaccess library to access MODIS data in my project. In my workflow, I use both HDF paths and the XML paths associated with the HDF files. However, when I use the search_data function from the library, the results only provide the HDF paths.
And what I get is:
['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/MCD12Q1.061/MCD12Q1.A2001001.h01v09.061.2022146025902/MCD12Q1.A2001001.h01v09.061.2022146025902.hdf']
This is certainly fine, but it would be nice to have an option that gives you access to the .xml-related file also, or at least the capability to download that file passing the DataGranule related to the hdf file.