terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Implement More Efficient Regular Cache Refresh #160

Closed nheyek closed 7 years ago

nheyek commented 7 years ago

Description

Implement daily cache refresh that only updates data that has been changed (updated_at since the last cache refresh) and new data, rather than refreshing entire cache.

Completion Criteria

dlebauer commented 7 years ago

We should also consider the alternative of querying directly from the database since the API is so slow. Let me know if you need help with this.

nheyek commented 7 years ago

We can do that, but worth noting that the performance of the app itself doesn't depend on the speed of the API, since it gets all of the necessary data from the cache. Faster querying would improve the speed of the cache refresh, but it's not likely to really matter since this happens in the background once per day.

dlebauer commented 7 years ago

My thought was that the database would take the place of the cache, which would reduce the complexity of the code.

nheyek commented 7 years ago

I see, I wonder if there would be a significant performance slowdown. Would it be an issue to query database for hundreds of thousands of records for every page view? I radically changed the cache structure if you want to take a look in the feature branch, it's a lot nicer than the previous method, allows the actual app.R script to look really clean, since the cache-update script does everything involving BETYdb and stores a single object with only the data necessary for the app.

dlebauer commented 7 years ago

Connecting to the database

  1. create an ssh tunnel: ssh -Nf -L 5432:localhost:5432 bety6.ncsa.illinois.edu
  2. in R, use connections like those in https://github.com/pi4-uiuc/2017-bootcamp/blob/master/content/post/2017-05-31-exploratory-data-analysis.Rmd
    • but use the credentials host: localhost port:5432: user:bety, database:bety and I've sent you the password separately