stac-utils / stactools

Command line utility and Python library for STAC
https://stactools.readthedocs.io/
Other
104 stars 28 forks source link

New command: `stac archive <href> <outfile>` #342

Open gadomski opened 2 years ago

gadomski commented 2 years ago

As brought up in Gitter by @schwehr, it would be a neat feature to crawl an entire catalog/collection, make all hrefs relative, and save it in an archive format (e.g. zip, tarball, etc).

schwehr commented 2 years ago

Discussion is here: https://gitter.im/SpatioTemporal-Asset-Catalog/Lobby?at=62eaf7427ccf6b6d45c17803

Quoting myself:

I was playing with pystac_client earlier this week with Earth Engine's catalog. It's hard to not notice how long it takes load the entire catalog (more than 90 seconds), which is only 28MB total. Aside from having a STAC API setup (which I would like to do), I was wondering what people would think of having a zip of the catalog next to the full tree (and/or .tar.gz, .tar.bz2, .tar.xz cause they are even smaller... 640K for the xz)? This would not be replacing the regular STAC tree. Pulling a single file that small from GCS is pretty fast.

Thanks Pete for the feedback! async would definitely help too. I think the one thing I need to do is to make sure I rewrite the json inside the zip to all be relative. Otherwise all clients are likely to go right back to the separate files after they see the top level catalog with links to the children as separate independent json files via http.

digidude commented 2 years ago

Does the existing stac-fastapi server have compressed HTTP (gzip,...) enabled?

gadomski commented 2 years ago

I don't know, I'd recommend asking over there: https://github.com/stac-utils/stac-fastapi/issues