Cronjob that scrapes remote servers and updates the cache (YAML docs) with a running Log file of updates and changes at the file level.
Local operation processes "new" files
Scan Log file as far back as required for relevant changes
Request file(s) from Local cache
If the file(s) have already been downloaded then return the local paths
Else if the file(s) are not downloadable (remote access only) then return the remote paths (arguably the operation could know this so wouldn't go through the cache but its mentioned here for completeness)
Else the file(s) are downloadable but not in the local cache
Download to local cache
There will be a delay while the download occurs, how should this be managed?
Scrape remote server pages
Capture remote information
Store info in local index
Local cache