sul-dlss / dlme-transform

Transforms raw DLME metadata to DLME intermediate representation
Apache License 2.0
0 stars 2 forks source link

Error transforming Cambridge data #440

Closed jacobthill closed 4 years ago

jacobthill commented 4 years ago

    while executing (to_field "agg_is_shown_at" at traject_configs/cambridge_config.rb:91)

    Record: <?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>T-S NS 196.20</title>
            <funder>The digitisation of the Taylor-Schechter Cairo Genizah Collection has been sponsored by the <ref target="http://www.jewishmanuscripts.org/">Jewish Manuscript Preservation Society</ref>, the <ref target="http://www.genizah.org/">Friedberg Genizah Project Inc.</ref>, and the <ref target="http://www.ahrc.ac.uk/Pages/default.aspx">Arts and Humanities Research Council, UK</ref>.</funder>
            <principal>Ben Outhwaite</principal>
         </titleStmt>
         <publicationStmt>
            <date calendar="Gregorian">[Date when first made available]</date>
            <publisher>Cambridge University Library</publisher>
            <pubPlace>
               <address>
                  <addrLine>Cambridge University Library</addrLine>
                  <street>West Road</street>
                  <settlement>Cambridge</settlement>
                  <postCode>CB3 9DR</postCode>
                  <addrLine>
                     <ref target="http://www.lib.cam.ac.uk">Cambridge University Library
                                </ref>
                  </addrLine>
                  <addrLine>
                     <email>library@lib.cam.ac.uk</email>
                  </addrLine>
               </address>
            </pubPlace>
            <availability xml:id="displayImageRights" status="restricted">
               <p>Zooming image © Cambridge University Library, All rights reserved.</p>
            </availability>
            <availability xml:id="downloadImageRights" status="restricted">
               <licence>This image may be used in accord with fair use and fair dealing provisions, including teaching and research. If you wish to reproduce it within publications or on the public web, please contact &lt;a href='mailto:genizah@lib.cam.ac.uk'&gt;genizah@lib.cam.ac.uk&lt;/a&gt;.</licence>
            </availability>
            <availability xml:id="metadataRights" status="restricted">
               <licence>This metadata is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.</licence>
            </availability>
         </publicationStmt>
         <sourceDesc>
            <msDesc xml:id="n1" xml:lang="eng">
               <msIdentifier>
                  <country>United Kingdom</country>
                  <region type="county">Cambridgeshire</region>
                  <settlement>Cambridge</settlement>
                  <institution>Cambridge University</institution>
                  <repository>Cambridge University Library</repository>
                  <idno>T-S NS 196.20</idno>
               </msIdentifier>
               <msContents>
                  <summary>ʿAmida for the additional service on Roš ha-Šana.
    </summary>
                  <msItem n="1">
                     <title>Liturgy</title>
                     <textLang mainLang="jrb" otherLangs="heb">Judaeo-Arabic; Hebrew (isolated Tiberian vocalisation)</textLang>
                     <author/>
                     <respStmt>
                        <name ref="http://viaf.org/viaf/32795172" type="person" role="dnr">
                           <persName type="standard">Schechter, S. (Solomon), 1847-1915</persName>
                           <persName type="display">Solomon Schechter</persName>
                        </name>
                        <name ref="http://viaf.org/viaf/51729236" type="person" role="dnr">
                           <persName type="standard">Taylor, Charles, 1840-1908</persName>
                           <persName type="display">Charles Taylor</persName>
                        </name>
                        <resp/>
                     </respStmt>
                  </msItem>
               </msContents>
               <physDesc>
                  <objectDesc>
                     <supportDesc material="paper">
                        <support>Paper</support>
                        <extent>2 leaves (bifolium).<dimensions type="leaf" unit="cm">
                              <height>16.5</height>
                              <width>12.8</width>
                           </dimensions>
                        </extent>
                        <condition>torn, rubbed</condition>
                     </supportDesc>
                     <layoutDesc>
                        <layout columns="1">12–13 lines + marginalia</layout>
                     </layoutDesc>
                  </objectDesc>
               </physDesc>
               <history>
                  <origin>
                     <origDate calendar="Gregorian" notBefore="0500-01-01" notAfter="1899-12-31">6th-19th century</origDate>
                  </origin>
                  <provenance>Donated by Dr Solomon Schechter and his patron Dr Charles Taylor in 1898 as part of the Taylor-Schechter Genizah Collection</provenance>
               </history>
               <additional>
                  <adminInfo>
                     <availability status="restricted">
                        <p>Entry to read in the Library is permitted only on presentation of
                                    a valid reader's card for admissions procedures contact <ref target="http://www.lib.cam.ac.uk/cgi-bin/eligibility.cgi/">Cambridge University Library Admissions</ref>).</p>
                     </availability>
                     <note/>
                  </adminInfo>
               </additional>
            </msDesc>
         </sourceDesc>
      </fileDesc>
      <encodingDesc>
         <classDecl>
            <taxonomy xml:id="LCSH">
               <bibl>
                  <ref target="http://id.loc.gov/authorities/about.html#lcsh">Library of
                            Congress Subject Headings</ref>
               </bibl>
            </taxonomy>
         </classDecl>
      </encodingDesc>
      <profileDesc>
         <textClass>
            <keywords scheme="#LCSH">
               <list>
                  <item>
                     <ref target="http://id.loc.gov/authorities/subjects/sh85018717.html">
                  Cairo Genizah
               </ref>
                  </item>
               </list>
            </keywords>
         </textClass>
      </profileDesc>
      <revisionDesc>
         <change when="2012-04-10">
            Uri Ehrlich, the Liturgy Research Project, Dept. of Jewish Thought, Ben Gurion University of the Negev; and the Genizah Research Unit
         </change>
      </revisionDesc>
   </teiHeader>
   <facsimile>
      <graphic decls="#document-thumbnail" rend="landscape" url="http://cudl.lib.cam.ac.uk/content/images/MS-TS-NS-00196-00020-000-00002_files/8/0_0.jpg"/>
      <surface n="1r" xml:id="i1">
         <graphic height="4657px" width="6521px" decls="#downloadImageRights #download" url="http://cudl.lib.cam.ac.uk/content/images/MS-TS-NS-00196-00020-000-00002.jpg"/>
         <graphic decls="#thumbnail" url="http://cudl.lib.cam.ac.uk/content/images/MS-TS-NS-00196-00020-000-00002_files/8/0_0.jpg" rend="landscape"/>
      </surface>
   </facsimile>
   <text>
      <body>
         <div>
            <pb xml:id="Lg-1r" n="1r" facs="#i1"/>
         </div>
      </body>
   </text>
</TEI>

    Exception: TypeError: no implicit conversion of nil into String
    /usr/local/bundle/gems/traject-3.2.0/lib/traject/macros/transformation.rb:150:in `+'

[ERROR] no implicit conversion of nil into String
/usr/local/bundle/gems/traject-3.2.0/lib/traject/macros/transformation.rb:150:in `+': no implicit conversion of nil into String (TypeError)
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/macros/transformation.rb:150:in `block (2 levels) in prepend'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/macros/transformation.rb:150:in `collect!'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/macros/transformation.rb:150:in `block in prepend'
    from /usr/local/bundle/gems/traject_plus-1.3.0/lib/traject_plus/macros.rb:15:in `block (2 levels) in transform_values'
    from /usr/local/bundle/gems/traject_plus-1.3.0/lib/traject_plus/macros.rb:13:in `each'
    from /usr/local/bundle/gems/traject_plus-1.3.0/lib/traject_plus/macros.rb:13:in `block in transform_values'
    from /usr/local/bundle/gems/traject_plus-1.3.0/lib/traject_plus/macros.rb:11:in `transform_values'
    from /usr/local/bundle/gems/traject_plus-1.3.0/lib/traject_plus/macros.rb:11:in `transform_values'
    from traject_configs/cambridge_config.rb:92:in `block (2 levels) in load_config_file'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/indexer/step.rb:140:in `block in execute'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/indexer/step.rb:135:in `each'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/indexer/step.rb:135:in `execute'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/indexer.rb:464:in `block (2 levels) in map_to_context!'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/indexer.rb:504:in `handle_mapping_errors'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/indexer.rb:463:in `block in map_to_context!'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/indexer.rb:457:in `each'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/indexer.rb:457:in `map_to_context!'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/indexer.rb:582:in `block (3 levels) in process'
    from /usr/local/bundle/gems/traject-3.2.0/lib/traject/thread_pool.rb:123:in `block in maybe_in_thread_pool'
    from /usr/local/bundle/gems/concurrent-ruby-1.1.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:348:in `run_task'
    from /usr/local/bundle/gems/concurrent-ruby-1.1.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:337:in `block (3 levels) in create_worker'
    from /usr/local/bundle/gems/concurrent-ruby-1.1.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `loop'
    from /usr/local/bundle/gems/concurrent-ruby-1.1.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `block (2 levels) in create_worker'
    from /usr/local/bundle/gems/concurrent-ruby-1.1.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `catch'
    from /usr/local/bundle/gems/concurrent-ruby-1.1.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `block in create_worker'

{"cho_title":{"en":["Letter"]},"cho_creator":{"en":["Avon b. Ṣedaqa\n                           Avon b. Ṣedaqa"]},"cho_date":{"en":["1065 CE"]},"cho_date_range_norm":[1065],"cho_date_range_hijri":[457,458],"cho_dc_rights":{"en":["This image may be used in accord with fair use and fair dealing provisions, including teaching and research. If you wish to reproduce it within publications or on the public web, please contact genizah@lib.cam.ac.uk.","This metadata is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License."]},"cho_description":{"en":["Letter from Avon b. Ṣedaqa (probably in Jerusalem) to Nahray b. Nissim, 1065 CE."]},"cho_edm_type":{"en":["Text"],"ar-Arab":["نص"]},"cho_extent":{"en":["1 leaf.\n                              12.5\n                              17.3"]},"cho_has_type":{"en":["Manuscript"],"ar-Arab":["مخطوط"]},"cho_language":{"en":["Judeo-Arabic","Arabic"],"ar-Arab":["العربية اليهودية","العربية"]},"cho_provenance":{"en":["Donated by Dr Solomon Schechter and his patron Dr Charles Taylor in 1898 as part of the Taylor-Schechter Genizah Collection"]},"cho_publisher":{"en":["Cambridge University Library"]},"cho_spatial":{"en":["Jerusalem"]},"agg_data_provider":{"en":["Cambridge University Library"],"ar-Arab":["مكتبة جامعة كامبريدج"]},"agg_data_provider_country":{"en":["United Kingdom"],"ar-Arab":["المملكة المتحدة"]},"agg_provider":{"en":["Cambridge University Library"],"ar-Arab":["مكتبة جامعة كامبريدج"]},"agg_provider_country":{"en":["United Kingdom"],"ar-Arab":["المملكة المتحدة"]},"cho_type_facet":{"en":["Text","Text:Manuscript"],"ar-Arab":["نص","نص:مخطوط"]},"id":"cambridge_genizah-2088","transform_version":"562301e","transform_timestamp":"2019-12-05 16:29:21 +0000","agg_data_provider_collection":"cambridge/genizah","agg_is_shown_at":{"wr_is_referenced_by":["https://cudl.lib.cam.ac.uk/iiif/MS-TS-00008-00257"],"wr_id":"https://cudl.lib.cam.ac.uk/view/MS-TS-00008-00257/1"},"agg_preview":{"wr_id":"https://image01.cudl.lib.cam.ac.uk/content/images/MS-TS-00008-00257-000-00001_files/8/0_0.jpg"}}
jtim@li-dl-7330-1443 dlme-transform $```
aaron-collier commented 4 years ago

Confirmed with @jacobthill that this was a one off.

jacobthill commented 4 years ago

Actually I did get an error it just happened after 10 minutes.

aaron-collier commented 4 years ago

I don't think this should be a dlme-transform ticket as the error indicates there is a record without an ID matching the expected pattern.

Perhaps move into dlme-metadata and analyze for the problematic record(s)

aaron-collier commented 4 years ago

Confirmed that the current record in error has:

http://cudl.lib.cam.ac.uk/content/images/MS-TS-NS-00196-00020-000-00002_files/8/0_0.jpg

in the field which is pattern matching on:

-000-0001_files.... so the -000-0002_files... is a problem.

This raises a larger question, which I believe is also a metadata problem about how to extract the id, as this is the only ID (in the existing data) that is able to be matched to a IIIF manifest.