landsat-pds / landsat_ingestor

Scripts and other artifacts for landsat data ingestion into Amazon public hosting.
Apache License 2.0
102 stars 18 forks source link

add csv of scene_list with proper extension #6

Open auremoser opened 9 years ago

auremoser commented 9 years ago

Hello! I was wondering if you would mind adding a .csv estension to your scene_list data for easier uploads into mapping programs to visualize those data. I added it to your readme (the only change on this pr) to make things easy. :dancer:

Also, curious about whether there is a .py script to process the max/min lat/lon from .csv to .geojson? It would be 110% rad to point to the .geojson file from cartodb and setup a publicaly available continuous sync table for updates. If geojson isn't on the books, do you accept contributions for those conversion scripts as PRs?

The maps are pretty...like this one of the max lat lons and cloud cover. Thank you for reading! landsat

warmerdam commented 9 years ago

@auremoser - I do not want to remove the existing scene_list.gz (and it's update) since that has been the existing contract with consumers, but we could add a scene_list.csv.gz (or scene_list.csv). Do you think we should try to host is uncompressed for more direct use?

This could be implemented in the upload_run_list() function in ingestor/puller.py.

Pull requests are welcome.

We did also at one point contemplate having a geojson representation of the scene_list, but it was felt it would be a bit abtruse for some users. We could offer it as an alternate view, also possibly generated in the upload_run_list() function.

As for keeping an up to date cartodb table, that would be great. The dumping of run files is intended to make it easy to track new entries without having to refetch and reload the whole scene list each time. So one approach would be a script that just pulls outstanding run files to update a cartodb table. That could potentially be pushed from the ingestor scripts, or better yet (from my perspective), it could be run remotely with no special linkage to the ingestor scripts. I already have logic at Planet Labs to ingest new landsat scenes from the landsat-pds buckets based on the run files, and that is working quite well.

It would be fantastic for a public cartodb table to be one of the normal views into the scene list that we point people to!

auremoser commented 9 years ago

Yes I think the unzipped one is a great idea!

Hmm, that function (upload_run_list()) is missing in my version of the repo :( I searched for it and did not find.

If you point out to me how the scene_list.csv.gz is being generated, I can do something to generate a csv and upload that. Where does the uncompressed version get generated in the code before you compress? Don't want to step on toes though, if you'd rather just build that in yourself.

Sounds cool about the remote run for geojson, would love to set that up as a sync table, can you point me to your logic from Planet Labs :)? We have a pretty slick import from a URL -> sync table (where you can set a sync time, like "every hour," "every week..."), and I could usher that to our common data.

Thank you for responding!!

jedsundwall commented 8 years ago

@warmerdam are you ok with me accepting this pull request?