ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

METABUS: fails to generate CSV / index with undefined method error #298

Open jvendetti opened 5 months ago

jvendetti commented 5 months ago

The latest version of METABUS (submission ID 11) fails to index. Reproducible with the following command on the production parsing box:

bin/ncbo_ontology_process -o METABUS -t index_search

Full stack trace:

E, [2024-01-19T14:35:07.891966 #17923] ERROR -- : ["NoMethodError: undefined method `id' for \"99301\":String
/srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/utils/ontology_csv_writer.rb:88:in `block in get_parent_ids'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/utils/ontology_csv_writer.rb:87:in `each'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/utils/ontology_csv_writer.rb:87:in `get_parent_ids'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/utils/ontology_csv_writer.rb:64:in `write_class'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/models/ontology_submission.rb:1312:in `block (5 levels) in index'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/models/ontology_submission.rb:88:in `synchronize'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/models/ontology_submission.rb:88:in `synchronize'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/models/ontology_submission.rb:1311:in `block (4 levels) in index'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/models/ontology_submission.rb:1281:in `each'
    /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-ee0013f0ee23/lib/ontologies_linked_data/models/ontology_submission.rb:1281:in `block (3 levels) in index'"]
jvendetti commented 5 months ago

@martinjoconnor mentioned that the voaf namespace declaration on line 26 of the RDF source file looked odd:

xmlns:vaof="http://purl.org/vocommons/voaf#"

The authors likely meant for the namespace to be voaf instead of vaof to match the end of the vocommons URL. I modified this, and I also corrected line 50 from this:

<metadataVoc xmlns="voaf:" rdf:resource="https://www.isibang.ac.in/"/>

... to this:

<voaf:metadataVoc rdf:resource="https://www.isibang.ac.in/"/>

... because using xmlns inside of a property declaration is incorrect.

I uploaded a new version of the ontology to our staging environment, but even with these corrections the ontology errors out with the same stack trace as listed above.

jvendetti commented 5 months ago

The code is failing when it tries to get the ID of a parent concept. I searched the RDF source file for occurrences of concepts where the parent declaration contains "99301". There are two occurrences in the source file:

<rdf:Description rdf:about="http://purl.org/m4m/99313">
    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
    <skos:prefLabel xml:lang="en">normal</skos:prefLabel>
    <skos:altLabel xml:lang="en">99313</skos:altLabel>
    <skos:broader xml:lang="en">99301</skos:broader>
    <dct:created rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2013-01-01</dct:created>
    <dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2013-01-01</dct:modified>
    <skos:inScheme rdf:resource="http://purl.org/m4m"/>
</rdf:Description>

<rdf:Description rdf:about="http://purl.org/m4m/99314">
    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
    <skos:prefLabel xml:lang="en">low</skos:prefLabel>
    <skos:altLabel xml:lang="en">99314</skos:altLabel>
    <skos:broader xml:lang="en">99301</skos:broader>
    <dct:created rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2013-01-01</dct:created>
    <dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2013-01-01</dct:modified>
    <skos:inScheme rdf:resource="http://purl.org/m4m"/>
</rdf:Description>

It turns out that both of these concepts have an incorrect skos:broader declaration of strings, instead of properly specifying the broader concept as an rdf:resource. In other words, the declarations should read:

<skos:broader rdf:resource="http://purl.org/m4m/99301"/>

I made this modification and uploaded another new version of the ontology to our staging environment. With this modification, indexing and CSV generation complete successfully.