landsat-pds / landsat_ingestor

Scripts and other artifacts for landsat data ingestion into Amazon public hosting.
Apache License 2.0
102 stars 18 forks source link

Duplicates in scene_list #18

Open jflasher opened 7 years ago

jflasher commented 7 years ago

Have seen a couple of duplicates showing up in scene_list.gz. Doesn't seem to be tied to date. Maybe items are getting queued up twice?

$ grep LC80200312015200LGN00 scene_list              
LC80200312015200LGN00,2015-07-19 16:16:07.837833,65.39,L1T,20,31,40.62882,-85.17706,42.79844,-82.23444,https://s3-us-west-2.amazonaws.com/landsat-pds/L8/020/031/LC80200312015200LGN00/index.html
LC80200312015200LGN00,2015-07-19 16:16:07.837833,65.39,L1T,20,31,40.62882,-85.17706,42.79844,-82.23444,https://s3-us-west-2.amazonaws.com/landsat-pds/L8/020/031/LC80200312015200LGN00/index.html
kapadia commented 7 years ago

@jflasher Yes, it's likely scenes are getting queued twice. We're currently back filling the archive, and though, there is some effort towards avoiding duplicates, it's still possible that we ingest the same scene multiple times.

After the back catalog is fully ingested, I'll prune the duplicates from the scene_list. It'll be about 2 months.

jflasher commented 7 years ago

👍 thanks @kapadia.