pulibrary / dpul

Princeton's digital collections: Digital PUL
https://dpul.princeton.edu/
Apache License 2.0
15 stars 0 forks source link

Reindexing jobs fail for certain exhibits #336

Closed jrgriffiniii closed 4 years ago

jrgriffiniii commented 6 years ago

Backtrace

line 33 of [PROJECT_ROOT]/app/jobs/spotlight/reindex_job.rb: block in perform
line 32 of [PROJECT_ROOT]/app/jobs/spotlight/reindex_job.rb: each
line 32 of [PROJECT_ROOT]/app/jobs/spotlight/reindex_job.rb: perform

View full backtrace and more info at honeybadger.io

RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request Error: { 'responseHeader'=>{ 'status'=>400, 'QTime'=>2}, 'error'=>{ 'metadata'=>[ 'error-class','org.apache.solr.common.SolrException', 'root-error-class','org.apache.solr.common.SolrException'], 'msg'=>'Error parsing JSON field value. Unexpected OBJECT_START at [1896], field=exhibit_lae_readonly_language_ssim', 'code'=>400}} URI: [...]/update?wt=ruby&commitWithin=500 Request Headers: {"Content-Type"=>"application/json"} Request Data: "[{\"id\":\"2212d08324c13baa1cb4e074c33b6b33\",\"exhibit_lae_public_bsi\":true,\"exhibit_lae_readonly_edm-rights_ssim\":[\"http://rightsstatements.org/vocab/CNE/1.0/\"],\"readonly_edm-rights_tesim\":[\"http://rightsstatements.org/vocab/CNE/1.0/\"],\"exhibit_lae_readonly_collections_ssim\":[null],\"readonly_collections_tesim\":[null],\"exhibit_lae_readonly_type_ssim\":[\"pcdm:Object\"],\"readonly_type_tesim\":[\"pcdm:Object\"],\"exhibit_lae_readonly_title_ssim\":[\"Estamos formando un nuevo partido. Súmate. Con creatividad, fuerza y esperanza, trabajemos por: ¡Un Chile sustentable, democrático, justo y solidario!\"],\"readonly_title_tesim\":[\"Estamos formando un nuevo partido. Súmate. Con creatividad, fuerza y esperanza, trabajemos por: ¡Un Chile sustentable, democrático, justo y solidario!\"],\"exhibit_lae_readonly_publisher_ssim\":[\"Partido Ecologista\"],\"readonly_publisher_tesim\":[\"Partido Ecologista\"],\"exhibit_lae_readonly_barcode_ssim\":[\"32101082723618\"],\"readonly_barcode_tesim\":[\"32101082723618\"],\"exhibit_lae_readonly_label_ssim\":[\"Folder 214\"],\"readonly_label_tesim\":[\"Folder 214\"],\"exhibit_lae_readonly_is-part-of_ssim\":[\"Latin American Ephemera\"],\"readonly_is-part-of_tesim\":[\"Latin American Ephemera\"],\"exhibit_lae_readonly_coverage_ssim\":[\"https://figgy.princeton.edu/catalog/c9713f6f-74c4-4211-a0f3-8844e9cdd2c9\"],\"readonly_coverage_tesim\":[\"https://figgy.princeton.edu/catalog/c9713f6f-74c4-4211-a0f[TRUNCATED]

jrgriffiniii commented 6 years ago

Problematic JSON values are being passed to the fields exhibit_msstreasures_readonly_member-of-collections_ssim and exhibit_music_readonly_member-of-collections_ssim

tpendragon commented 6 years ago

Sounds like our jsonld may have not be parsed in a reasonable way for member of collections?

jrgriffiniii commented 6 years ago

The log for Sneakers yielded the following:

"exhibit_slavic_readonly_member-of-collections_ssim": {
  "set": [
    {
      "internal_resource":"Collection",
      "created_at":"11/29/17 12:46:50 PM UTC",
      "updated_at":"11/30/17 09:46:05 PM UTC",
      "new_record":false,
      "read_groups":[],
      "read_users":[],
      "edit_users":[],
      "edit_groups":[],
      "id": {"id":"058c1862-30dc-431c-90b5-4e141282c7a1"},
      "title":"Early Soviet Illustrated Sheet Music",
      "slug":"TBD",
      "description":[],
      "visibility":["\u003cdiv class=\"label label-success\"\u003e\u003cspan class=\"icon\"\u003e\u003c/span\u003e\u003cspan class=\"text\"\u003eopen\u003c/span\u003e\u003c/div\u003e"],
      "local_identifier":["prx9170802"]
    },
    {
      "internal_resource":"Collection",
      "created_at":"03/02/18 08:49:51 PM UTC",
      "updated_at":"03/30/18 06:21:58 PM UTC",
      "new_record":false,
      "read_groups":[],
      "read_users":[],
      "edit_users":[],
      "edit_groups":[],
      "id":{"id":"d01c0b70-85b2-4b86-8f8e-b0541f9bfe96"},
      "title":"Music and Performing Arts Collections at Princeton",
      "slug":"music",
      "description":["Umbrella collection for music materials."],
      "visibility":["\u003cdiv class=\"label label-success\"\u003e\u003cspan class=\"icon\"\u003e\u003c/span\u003e\u003cspan class=\"text\"\u003eopen\u003c/span\u003e\u003c/div\u003e"],
      "local_identifier":[]
    },
    {
      "internal_resource":"Collection",
      "created_at":"11/29/17 12:47:44 PM UTC",
      "updated_at":"11/30/17 09:46:06 PM UTC",
      "new_record":false,"read_groups":[],
      "read_users":[],"edit_users":[],
      "edit_groups":[],
      "id":{"id":"38b9e0a1-974d-4bd9-924b-6a504479c87e"},
      "title":"Princeton Slavic Collections",
      "slug":"slavic",
      "description":[],
      "visibility":["\u003cdiv class=\"label label-success\"\u003e\u003cspan class=\"icon\"\u003e\u003c/span\u003e\u003cspan class=\"text\"\u003eopen\u003c/span\u003e\u003c/div\u003e"],
      "local_identifier":["pm900rs82s"]
    }
  ]
},

It is likely the case that this value needs to be a string rather than a JSON Object, or that the ID for the Figgy Collection needs to be a string.

jrgriffiniii commented 6 years ago

Reopened in response to https://app.honeybadger.io/projects/51733/faults/35986626

tpendragon commented 6 years ago

Reopened. In the past this has been the side effect of bad metadata left in the exhibit.

tpendragon commented 5 years ago

Looks like this is a problem again with the Slavic collection.

hackartisan commented 5 years ago

see another instance of this bug https://app.honeybadger.io/projects/51733/faults/42804704#notice-summary

jrgriffiniii commented 5 years ago

This still occurs for items in the Slavic Collection (this occurred when attempting to update https://dpul.princeton.edu/catalog/7d278w65f) (please see https://app.honeybadger.io/projects/51733/faults/46031444#notice-summary).

tpendragon commented 5 years ago

The issue seems to be sidecars with hashes in a value. I fixed one resource with this:

irb(main):017:0> resources[0].solr_document_sidecars.first.data.select{|k, v| Array.wrap(v).find{|x| x.is_a?(Hash)}}.each do |k, v|
irb(main):018:1* resources[0].solr_document_sidecars.first.data[k] = []
irb(main):019:1> resources[0].solr_document_sidecars.first.save
irb(main):020:1> end

Iterating over all the sidecars and looking for hash values may be able to fix this?

tpendragon commented 4 years ago

I'm closing this. Next time we have an indexing issue let's open up an issue with a specific reference to the exhibit that's failing. These super issues don't get closed otherwise.