sat-utils / sat-stac

Python library for creating and working with STAC catalogs
MIT License
67 stars 25 forks source link

[Enhancement] Add ability to get a specific child from a catalog or collection #64

Closed AtmaMani closed 3 years ago

AtmaMani commented 4 years ago

I use sat-stac to parse and discover data from TROPOMI sensor on S3. To get a particular child from a Catalog or Collection, the current workflow is to navigate through the results of the generator. This process takes a lot of time for my dataset. Instead, if I were to have a Catalog.get_child(id='value') API (similar to in pystac), I could get to my known child much quicker. See below for some time profiling:

>>> from satstac import Catalog, Collection, Item
>>> coll = Collection.open('https://meeo-s5p.s3.amazonaws.com/catalog.json')
>>> %time coll_children = list(coll.children())
>>> print(coll_children)
# CPU times: user 70.8 ms, sys: 5.5 ms, total: 76.3 ms
# Wall time: 2.97 s
# [meeo-s5p-cog, NRTI, OFFL, RPRO]

>>> offl = coll_children[2]
>>> offl.links()
['https://meeo-s5p.s3.amazonaws.com/OFFL/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__AER_AI/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__CH4___/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__CLOUD_/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__CO____/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__HCHO__/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__NO2___/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__O3____/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__SO2___/catalog.json']

>>> %time offl_no2 = list(offl.children())[-3]
>>> offl_no2
# CPU times: user 144 ms, sys: 12.7 ms, total: 157 ms
# Wall time: 6.72 s

# L2__NO2___

>>> %time offl_no2_direct = Catalog.open(offl.links()[-3])
# CPU times: user 19.6 ms, sys: 3.33 ms, total: 22.9 ms
# Wall time: 831 ms

If I were to open a child directly with get_child(), I could hypothetically get it in under a second, compared to 6.7s in the current workflow.

matthewhanson commented 3 years ago

This is a bit overdue, sorry @AtmaMani .

This library is getting close to deprecated, since there is a much feature rich library called PyStac if you've not seen it.

sat-stac will have some minor maintenance updates but I'll not be adding new features.

AtmaMani commented 3 years ago

thanks for the update @matthewhanson Yes I am familiar with PyStac and will switch over to it.