pulibrary / orangetheses

Harvest PUL Senior Theses from DSpace
Other
2 stars 0 forks source link

Export of senior theses record should not include pu. fields from dspace #85

Closed christinach closed 1 month ago

christinach commented 1 month ago

Expected behavior

Example record from last successfully indexed export

{
    "id": "dsp01vh53wv957",
    "title_t": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_citation_display": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_display": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_sort": "computeranalysisofthetransientresponseofpressuretransducerstoshockinputs",
    "author_sort": "Pbi, W. C.",
    "electronic_access_1display": "{\"http://arks.princeton.edu/ark:/88435/dsp01vh53wv957\":[\"DataSpace\",\"Citation only\"]}",
    "restrictions_note_display": [
      "This thesis can be viewed in person at the <a href=http://mudd.princeton.edu>Mudd Manuscript Library</a>.  \nTo order a copy complete the <a href=\"http://rbsc.princeton.edu/senior-thesis-order-form\" target=\"_blank\">Senior Thesis Request Form</a>.  \nFor more information contact <a href=mailto:mudd@princeton.edu>mudd@princeton.edu</a>."
    ],
    "call_number_display": "AC102",
    "call_number_browse_s": "AC102",
    "language_facet": "English",
    "language_name_display": "English",
    "author_display": [
      "Pbi, W. C."
    ],
    "author_s": [
      "Pbi, W. C.",
      "Princeton University. Department of Aeronautical Engineering"
    ],
    "department_display": [
      "Princeton University. Department of Aeronautical Engineering"
    ],
    "location": "Mudd Manuscript Library",
    "location_display": "Mudd Manuscript Library",
    "location_code_s": "mudd$stacks",
    "advanced_location_s": [
      "mudd$stacks",
      "Mudd Manuscript Library"
    ],
    "access_facet": "In the Library",
    "holdings_1display": "{\"thesis\":{\"location\":\"Mudd Manuscript Library\",\"library\":\"Mudd Manuscript Library\",\"location_code\":\"mudd$stacks\",\"call_number\":\"AC102\",\"call_number_browse\":\"AC102\",\"dspace\":true}}",
    "class_year_s": [
      "1966"
    ],
    "pub_date_start_sort": [
      "1966"
    ],
    "pub_date_end_sort": [
      "1966"
    ],
    "format": "Senior thesis"
  },

Actual behavior

The same record using version v1.4.3

{
    "id": "dsp01vh53wv957",
    "title_t": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_citation_display": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_display": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_sort": "computeranalysisofthetransientresponseofpressuretransducerstoshockinputs",
    "author_sort": "Pbi, W. C.",
    "electronic_access_1display": "{\"http://arks.princeton.edu/ark:/88435/dsp01vh53wv957\":[\"DataSpace\",\"Full text\"]}",
    "pu.embargo.lift": null,
    "pu.embargo.terms": null,
    "pu.mudd.walkin": null,
    "pu.location": [
      "This thesis can be viewed in person at the <a href=http://mudd.princeton.edu>Mudd Manuscript Library</a>.  \nTo order a copy complete the <a href=\"http://rbsc.princeton.edu/senior-thesis-order-form\" target=\"_blank\">Senior Thesis Request Form</a>.  \nFor more information contact <a href=mailto:mudd@princeton.edu>mudd@princeton.edu</a>."
    ],
    "dc.rights.accessRights": null,
    "call_number_display": "AC102",
    "call_number_browse_s": "AC102",
    "language_facet": "English",
    "language_name_display": "English",
    "author_display": [
      "Pbi, W. C."
    ],
    "author_s": [
      "Pbi, W. C.",
      "Princeton University. Department of Aeronautical Engineering"
    ],
    "department_display": [
      "Princeton University. Department of Aeronautical Engineering"
    ],
    "access_facet": "Online",
    "electronic_portfolio_s": "{\"thesis\":{\"call_number\":\"AC102\",\"call_number_browse\":\"AC102\",\"dspace\":true}}",
    "class_year_s": [
      "1966"
    ],
    "pub_date_start_sort": [
      "1966"
    ],
    "pub_date_end_sort": [
      "1966"
    ],
    "format": "Senior thesis",
    "restrictions_note_display": [
      "This thesis can be viewed in person at the <a href=http://mudd.princeton.edu>Mudd Manuscript Library</a>.  \nTo order a copy complete the <a href=\"http://rbsc.princeton.edu/senior-thesis-order-form\" target=\"_blank\">Senior Thesis Request Form</a>.  \nFor more information contact <a href=mailto:mudd@princeton.edu>mudd@princeton.edu</a>."
    ]
  },

Steps to replicate

Run the rake task on bibdata-staging

Impact of this bug

The rake task fails to index because of the new unnecessary fields in the export json

    "msg":"2 Async exceptions during distributed update: \nError from server at http://lib-solr-staging5d.princeton.edu:8983/solr/catalog-staging1_shard1_replica_n2/: null\n\n\n\nrequest: http://lib-solr-staging5d.princeton.edu:8983/solr/catalog-staging1_shard1_replica_n2/\nRemote error message: ERROR: [doc=dsp01vh53wv957] unknown field 'pu.location'\nError from server at http://lib-solr-staging5d.princeton.edu:8983/solr/catalog-staging1_shard2_replica_n8/: null\n\n\n\nrequest: http://lib-solr-staging5d.princeton.edu:8983/solr/catalog-staging1_shard2_replica_n8/\nRemote error message: ERROR: [doc=dsp0141687h67f] unknown field 'pu.location'",
    "code":400}}

Acceptance criteria

Implementation notes, if any

jrgriffiniii commented 1 month ago

Locally this seems to be functioning without any errors:

bundle exec rake oai:index_record[oai:dataspace.princeton.edu:88435/dsp01vh53wv957] SOLR="http://localhost:8983/solr/orangetheses-core-development"
[10:53:32] INFO: Adding dsp01vh53wv957
jrgriffiniii commented 1 month ago

I have just tested this against the staging environment for bibdata, and it successfully completed for the theses collection. However, I must please defer to others in DACS in order to be certain that this indeed fixing the indexing errors.

christinach commented 1 month ago

@jrgriffiniii I posted an update on the PR that there is still a failure. The rake task can successfully export the records. However because the field pu.location exists in the export, it causes the POST to the solr index to fail.

The last orangetheses ref that works is 4ac8dc2bd04b10db764fc37df3261531c9937061 https://github.com/pulibrary/bibdata/blob/2adad5269031fd31a80d72f7e68bfb226d6f85ce/Gemfile#L50

jrgriffiniii commented 1 month ago

I am very sorry, I may need to please request for assistance with this, as I am finding the following when I invoke bundle exec rake oai:index_all[com_88435_dsp019c67wm88m] SOLR="http://lib-solr8-prod.princeton.edu:8983/solr/catalog-alma-production", that this succeeds. I assumed that this transmitted a POST request to the Solr endpoint.

jrgriffiniii commented 1 month ago

I was corrected and I am now testing against the following:

bundle exec rake orangetheses:cache_theses
jrgriffiniii commented 1 month ago

I have tested the following successfully:

RAILS_ENV=staging SOLR="http://lib-solr8d-staging.princeton.edu:8983/solr/catalog-staging" bundle exec rake orangetheses:cache_collection[361]
christinach commented 1 month ago

bundle exec rake orangetheses:cache_theses will create a json file with the desired records from dspace. I'm happy to test the changes on bibdata staging or if you wish to try indexing on staging please follow:

  1. create a bibdata branch using the specific branch from orangetheses in the Gemfile
  2. Deploy the bibdata branch to bibdata staging environment
  3. ssh deploy@bibdata-worker-staging1.lib.princeton.edu
  4. cd /opt/bibdata/current
  5. FILEPATH=/home/deploy/theses.json bundle exec rake orangetheses:cache_theses (this will override the existing theses.json file which is ok.)
  6. curl 'http://lib-solr8d-staging.princeton.edu:8983/solr/catalog-staging/update?commit=true' --data-binary @/home/deploy/theses.json -H 'Content-type:application/json'