pulibrary / pdc_discovery

Princeton Data Commons discovery portal for Research Data
10 stars 0 forks source link

Indexing troubleshooting #538

Closed hectorcorrea closed 8 months ago

hectorcorrea commented 9 months ago

Troubleshooting issue https://github.com/pulibrary/pdc_discovery/issues/485

Test to see if the Solr reindex issue happens if we don't reindex the data from DataSpace.

It looks to me like the DataSpace reindex is getting stuck (perhaps DataSpace is not responding?) and our reindex never finishes. Then 1/2hr later the rake tasks that reindex starts again and Solr is blocking that second reindex since we never finished the first one.

I've deployed this branch to staging on 11/30/2023 at 11:26 AM. We'll see if the schedule reindex in place works for more than 24 hours this time.

Disabling DataSpace reindexing did not fix the issue.

12/4/2023 - Pointing directly to a Solr node (lib-solr-staging4) instead of the default URL (lib-solr8-staging configured via Ansible) that goes through the Load Balancer/Zookeeper (?)

12/7/2023 - Pointing to lib-solr-staging4 did not help. Reverting the change.

hectorcorrea commented 8 months ago

As of January 11, 2024 we are pursuing another approach. See https://docs.google.com/document/d/1Rgc7KIlF9QVkPq8JHJl0JR-_kY7015O3dzde_RteGZk/edit#heading=h.zgv1l31zpjk7