Closed hackartisan closed 2 years ago
We think this is a similar issue to Figgy - replication across data centers times out and breaks. I've removed lib-solr-prod4 as a target for Pulfalight and retried the jobs.
These succeeded after removing that replication.
Ops has moved lib-solr-prod4 to the same datacenter as 5/6. Re-enable replication during maintenance week and close this.
timeout errors when indexing the following:
All of these had a handfull to a dozen identical indexing jobs retrying in sidekiq. They must be getting queued up repeatedly. I killed duplicates for all of these, but I expect they will pile up again.
When they fail it looks like they write the entire EAD record to the logs. This is causing a space issue on pulfalight-worker1. Currently its drive is full from 30G of production logs.
[PULFALight/production] RSolr::Error::Http: RSolr::Error::HttpURI: http://lib-solr-prod4.princeton.edu:8983/solr/pulfalight-production/update?wt=jsonRequest Headers: {"Content-Type"=>"application/json"}Request Data: "[{\"id\":\"MC016\",\"ead_ssi\":\"MC016\",\"title_ssm\":\"John Foster Dulles Papers\",\"title_teim\":\"John Foster Dulles Papers\",\"subtitle_ssm\":\"John Foster Dulles Papers\",\"subtitle_teim\":\"John Foster Dulles Papers\",\"ark_tsim\":\"http://arks.princeton.edu/ark:/88435/br86b3576\",...
Backtrace
View full backtrace and more info at honeybadger.io
Sudden Priority Justification
Archivists are unable to see their changes until this is fixed.