sul-dlss / exhibits

Stanford University Libraries online exhibits showcase
https://exhibits.stanford.edu
Other
19 stars 7 forks source link

Indexing failed for Virtual Tribunals exhibit #2461

Open laurensorensen opened 1 month ago

laurensorensen commented 1 month ago

As mentioned in exhibits planning stand up, indexing failed when I added druids for to Spotlight VT exhibit on Thursday. Attached is a csv of the druids that were added is attached.

Thanks for any help.

first_edition_STL.csv

corylown commented 1 month ago

We are now getting useful errors messages from indexing. The error I'm seeing is from Solr and it's complaining about an empty geo field: Error adding field 'geographic_srpt'='[ ]' msg=empty string shape URI For the following druids:

bb455gs3711
fc092gx4720
fc887cx1232
gy350wq7934
hw662xk3431
qn242sh2366
sk422jd2844

I looked at https://purl.stanford.edu/bb455gs3711 and found what looks like an empty space in a geo element gml:pos:

<extension displayLabel="geo">
  <rdf:RDF xmlns:gml="http://www.opengis.net/gml/3.2/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:gmd="http://www.isotc211.org/2005/gmd">
    <rdf:Description rdf:about="http://purl.stanford.edu/bb455gs3711">
      <dc:format>application/pdf</dc:format>
      <dc:type>Image</dc:type>
      <gmd:centerPoint>
        <gml:Point>
          <gml:pos> </gml:pos>
        </gml:Point>
      </gmd:centerPoint>
    </rdf:Description>
  </rdf:RDF>
laurensorensen commented 1 month ago

There aren't any latitude / longitude information in the SDR/Cocina metadata for the druids above but the related fields around it are filled out. For every other grouping of fields the extraneous fields are ignored if the main data point isn't there (in this case latitude and longitude). I think this might be a Cocina issue?

laurensorensen commented 1 month ago

I deleted all the geo related metadata from these and am re-uploading

laurensorensen commented 1 month ago

I wrote to @arcadiafalcone today about the possibility that this might be an error with validation or in the relationship between Cocina and how Spotlight reads data, since there were no errors that showed up when I ran validation OR ingested to SDR, just upon indexing.

arcadiafalcone commented 1 month ago

I suspect there was a space in the field which got picked up as a value, while the ones that worked had the field completely blank (errant spaces have caused trouble before). I'll make a ticket to treat \s+ as null.

@laurensorensen Could you send me one of the ones that had a blank value and worked? I'd like to see if the extraneous data was dropped in spreadsheet>Cocina or Cocina>MODS.

laurensorensen commented 1 month ago

Sure, thanks. I didn't notice any spaces in the Google Sheet that the CSV was exported from (original issue above has that data as a csv). yf788ff2222 is an example.