ternaustralia / auscover-api

Ubiquitous access to the TERN AusCover data services and resources
Other
3 stars 0 forks source link

Data hub - Remote Indexer and Local Cache #11

Open mpaget opened 8 years ago

mpaget commented 8 years ago

Scrape remote server pages

Capture remote information

Store info in local index

Local cache

mpaget commented 8 years ago

Use cases

  1. Cronjob that scrapes remote servers and updates the cache (YAML docs) with a running Log file of updates and changes at the file level.
  2. Local operation processes "new" files
    • Scan Log file as far back as required for relevant changes
    • Request file(s) from Local cache
    • If the file(s) have already been downloaded then return the local paths
    • Else if the file(s) are not downloadable (remote access only) then return the remote paths (arguably the operation could know this so wouldn't go through the cache but its mentioned here for completeness)
    • Else the file(s) are downloadable but not in the local cache
      • Download to local cache
      • There will be a delay while the download occurs, how should this be managed?
  3. Local operation processes any files
    • Search YAML docs on any combination of fields
    • Return details for matching file(s)
    • Request file(s) from Local cache
    • ... as per 2)