podaac / data-subscriber

Subscribe and bulk download collections of data at PO.DAAC
Apache License 2.0
81 stars 28 forks source link

Refactor away from the .update file #174

Open joshgarde opened 2 months ago

joshgarde commented 2 months ago

Issue The current solution for maintaining the latest timestamp within a directory is via the .update hidden file. While this works, the solution is not portable or self evident to users.

Solution Refactor data-subscriber to instead utilize file metadata within the directory to determine the next start datetime to fetch from. This solution removes the need to maintain a .update file which may disappear if the user copies the granules from one directory to another without noticing the .update file. Potential issues that may arise is if the user is utilizing the directory for other work and adding additional files after subscriber runs or if the user is subscribing to multiple granules into the same directory.

An alternative solution may be to perform granule downloads in descending order of timestamps such that any granule that's not found already in the directory is downloaded, but once the subscriber hits a granule that does exist (implying that was the last stop point), it ends its execution. This solution would skip the need to look for file metadata which may change unbeknownst to the user and may be inconsistent across filesystems. It would also enable support for subscribing to multiple datasets within the same directory.

mike-gangl commented 2 months ago

it's been a while since i worked on this, but wanted to confirm- is this change only for the "downloader" tool, or is it for the subscribe tool as well? i'd be weary of changing the subscription feature because it's very purpose built- it's not meant to get data from the past (only data that are newly ingested, which could be "in the past" but has been recently updated". If you want to download various temporality, can't we just use the "data downloader" tool?

joshgarde commented 2 months ago

Reworked the ticket to something I think is more workable for subscriber specifically. Lmk your thoughts