ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

CL-SIMPLE: orphaned submission object is preventing new submissions #292

Closed jvendetti closed 9 months ago

jvendetti commented 9 months ago

From a user via the BioPortal support list:

We are having some issues with the upload of CL-SIMPLE (Cell Ontology Simple - Summary | NCBO BioPortal (bioontology.org)) into bioportal where it is stuck trying to upload the metadata to something that already is minted or something.

The reason the new submission can't be processed is due to this underlying error:

There is already a persistent resource with id http://data.bioontology.org/ontologies/CL-SIMPLE/submissions/9

... indicating that an orphaned object exists in the triplestore.

You can't use the Admin tab in BioPortal to delete the orphaned object, because it's not attached to the relevant ontology:

Screen Shot 2023-12-07 at 11 56 25 AM

In the above screen shot, submission 9 isn't listed.

Attempting to manually delete the orphaned object via an ncbo_cron console session also fails:

[1] pry(main)> sub = LinkedData::Models::OntologySubmission.find(RDF::URI.new("http://data.bioontology.org/ontologies/CL-SIMPLE/submissions/9")).first

[2] pry(main)> sub.bring_remaining

[3] pry(main)> sub.valid?
=> false

[4] pry(main)> sub.errors
=> {:hasOntologyLanguage=>{:existence=>"`` value cannot be nil"}, :contact=>{:existence=>"`[]` value cannot be nil"}, :released=>{:existence=>"`` value cannot be nil"}, :ontology=>{:existence=>"`` value cannot be nil"}}

[5] pry(main)> sub.delete
NoMethodError: undefined method `loaded_attributes' for nil:NilClass
from /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-7783784f9d2c/lib/ontologies_linked_data/models/ontology_submission.rb:109:in `segment_instance'

[6] pry(main)> sub.ontology
=> nil

[7] pry(main)> sub.bring(:ontology)
ArgumentError: To load attributes the resource must be persistent
from /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/goo-657149d6b338/lib/goo/sparql/loader.rb:292:in `block in raise_resource_must_persistent_error'

When BioPortal ran on 4store, there was a documented way to delete orphaned records by issuing a curl command directly against our production triplestore to delete the object. There is currently no documentation for the AG equivalent of resolving this type of issue.

syphax-bouazzouni commented 9 months ago

you can try this, maybe it will work.

sub.ontology = LinkedData::Models::OntologySubmission.find(RDF::URI.new("http://data.bioontology.org/ontologies/CL-SIMPLE")).first
sub.hasOntologyLanguage = LinkedData::Models::OntologyFormat.where(acronym: 'OWL').first
sub.contact =  [LinkedData::Models::Contact.new(email: 'test@test.com', name: 'test').save]
sub.released = DateTime.now 
sub.save 
jvendetti commented 9 months ago

Hi @syphax-bouazzouni, and thanks for the comment. I was hoping to avoid having to correct all of the invalid properties, in favor of just deleting the corrupt submission.

Noting here that I'm seeing some odd behavior in production. I tried a second time to access this submission via a console session, and this time I just get a nil value:

$ pry(main)> sub = LinkedData::Models::OntologySubmission.find(RDF::URI.new("http://data.bioontology.org/ontologies/CL-SIMPLE/submissions/9")).first
=> nil

I checked the file system on the production parsing box, and there's a directory for submission 9, so it definitely existed as of April of 2023:

[ncbo-deployer@ncbo-prd-app-31 CL-SIMPLE]$ ls -l
total 36
drwxrwsrwx. 2 ncbo-deployer ncbo 4096 Oct 28  2022 1
drwxrwsrwx. 2 ncbo-deployer ncbo 4096 Nov 25  2022 2
drwxrwsrwx. 2 ncbo-deployer ncbo 4096 Dec 15  2022 3
drwxrwsrwx. 2 ncbo-deployer ncbo 4096 Jan  9  2023 4
drwxrwsrwx. 2 ncbo-deployer ncbo 4096 Feb 15  2023 5
drwxrwsrwx. 2 ncbo-deployer ncbo 4096 Feb 19  2023 6
drwxrwsrwx. 2 ncbo-deployer ncbo 4096 Mar 21  2023 7
drwxrwsrwx. 2 ncbo-deployer ncbo 4096 Nov 16 00:24 8
drwxrwsrwx. 2 ncbo-deployer ncbo 4096 Apr 20  2023 9

Since I can no longer programmatically access the submission from a console session, I tried doing an on-demand pull:

[ncbo-deployer@ncbo-prd-app-31 ncbo_cron]$ bin/ncbo_ontology_pull -o CL-SIMPLE

... which seems to have mostly worked, except properties are failing to index:

E, [2023-12-07T14:15:23.822809 #28829] ERROR -- : Failed property indexing with exception: RSolr::Error::Http: RSolr::Error::Http - 500 Internal Server Error
jvendetti commented 9 months ago

OK, I now understand the "odd behavior" in production that I mentioned above. I assumed AG was behaving the same way as 4store, but it doesn't. When you perform a sub.delete call on an orphaned submission object in AG, the console shows an error:

NoMethodError: undefined method `loaded_attributes' for nil:NilClass

... but the deletion actually succeeds. In 4store this wasn't the case - the orphaned submission still existed, and there was an additional step required to delete it via a curl command.

I'm documenting an example here with a different ontology. Our nightly pull log has several errors such as this one:

{:proc_naming=>{:duplicate=>"There is already a persistent resource with id `http://data.bioontology.org/ontologies/APO/submissions/33`"}

This is console output showing that deletion of APO submission 33 is possible, despite the presence of a NoMethodError:

$ pry(main)> sub = LinkedData::Models::OntologySubmission.find(RDF::URI.new("http://data.bioontology.org/ontologies/APO/submissions/33")).first

$ pry(main)> sub.bring_remaining

$ pry(main)> sub.valid?
=> false

$ pry(main)> sub.errors
=> {:hasOntologyLanguage=>{:existence=>"`` value cannot be nil"}, :contact=>{:existence=>"`[]` value cannot be nil"}, :released=>{:existence=>"`` value cannot be nil"}, :ontology=>{:existence=>"`` value cannot be nil"}}

$ pry(main)> sub.delete
NoMethodError: undefined method `loaded_attributes' for nil:NilClass
from /srv/ncbo/ncbo_cron/vendor/bundle/ruby/2.7.0/bundler/gems/ontologies_linked_data-7783784f9d2c/lib/ontologies_linked_data/models/ontology_submission.rb:109:in `segment_instance'

$ pry(main)> sub = LinkedData::Models::OntologySubmission.find(RDF::URI.new("http://data.bioontology.org/ontologies/APO/submissions/33")).first
=> nil

The presence of a NoMethodError on a successful deletion is a problem in our code, and should be documented in a separate ticket.

Closing this particular issue as CL-SIMPLE is fully functioning in production now.