Data hub - Remote Indexer and Local Cache

ternaustralia / auscover-api

Ubiquitous access to the TERN AusCover data services and resources

Other

3 stars 0 forks source link

Scrape remote server pages

Capture remote information

Product
Variant (flexible depth)
Remote path
File name
Time index/dimension
Server time stamp
Server file size (optional, may be required if server time stamp is not available)
File format and Access method
Local cache path
IsLocalCacheValid (can the file be downloaded (html, ftp, thredds) or does one need to go to the remote source every time (WxS, NCML)).

Store info in local index

Local cache

Query YAML (via any combination of fields) for File(s)
If local path, return local path
Else, download to cache (if requested) and return local path, or return remote path
Configurable cache size
Cache clean up (remove "old" files when near/over cache size)

Use cases

Cronjob that scrapes remote servers and updates the cache (YAML docs) with a running Log file of updates and changes at the file level.
Local operation processes "new" files
- Scan Log file as far back as required for relevant changes
- Request file(s) from Local cache
- If the file(s) have already been downloaded then return the local paths
- Else if the file(s) are not downloadable (remote access only) then return the remote paths (arguably the operation could know this so wouldn't go through the cache but its mentioned here for completeness)
- Else the file(s) are downloadable but not in the local cache
  - Download to local cache
  - There will be a delay while the download occurs, how should this be managed?
Local operation processes any files
- Search YAML docs on any combination of fields
- Return details for matching file(s)
- Request file(s) from Local cache
- ... as per 2)

ternaustralia / auscover-api