sul-dlss-deprecated / rialto-etl

ETL tools for RIALTO, Stanford Libraries' research intelligence project
https://library.stanford.edu/projects/rialto
Apache License 2.0
3 stars 0 forks source link

Run one-off full load of publication data in prod (for all time) #322

Closed peetucket closed 5 years ago

peetucket commented 5 years ago

TO-DO:

mjgiarlo commented 5 years ago

Researchers have now been extracted and transformed. They are being loaded now. Once done (in 7-8 hours), grants will be loaded, followed by publications.

mjgiarlo commented 5 years ago

Grants are being ETLed now.

mjgiarlo commented 5 years ago

There are two issues with grant ETL in prod at the moment:

  1. The rialto-etl-prod shared_configs branch is pointing at the dev entity resolver (see related PR); and
  2. Both dev and prod entity resolver are behaving badly. Connections to dev in the ETL are returning Faraday:ConnectionFailed and connections to prod (in my browser, hitting the healthcheck endpoint) are getting 502 gateway errors or 504 gateway timeouts.

I could use some assistance moving this forward when you are available, @aaron-collier

mjgiarlo commented 5 years ago

Before resuming grants ETL, drop the named graph for grants to ensure we don't have dev data in our prod canonical store. See comment in notes.

mjgiarlo commented 5 years ago

Updated current state and TO-DOs in description of this issue. See :point_up:.

mjgiarlo commented 5 years ago

@peetucket How would you feel about re-assigning this issue to you for now, for the sake of knowledge transfer and modeling our ongoing mode of operation for this project? And I will be happy to assist/pair/etc.