ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

GO: top-level mappings tab has inaccuracies #297

Open jvendetti opened 5 months ago

jvendetti commented 5 months ago

From @caufieldjh:

Some mappings are not accessible? E.g. if I am on https://bioportal.bioontology.org/ontologies/GO/?p=mappings and I click CYTOKINE I get “No mappings found”


The top-level Mappings tab in BioPortal for the GO ontology shows a mapping count of 59 for mappings between GO and CYTOKINE:

Screenshot 2024-01-09 at 11 06 52 AM

There is no such CYTOKINE ontology in BioPortal currently. I checked the production server and there is no physical directory that matches this ontology acronym.

jvendetti commented 5 months ago

@alexskr - with regard to the above problem description, I'm wondering what the status is of the script that generates mapping counts? I know that we resurrected this process now that we're on Allegrograph, and on December 15th you mentioned in the bioportal-operations Slack channel that it was running in production. Did it run to completion? Are we back to executing once weekly on Saturday nights?

My first two ideas here are that 1) there's an issue in our mapping count generation code, or 2) if an ontology is deleted, somehow the associated triples that materialize mappings aren't cleanly removed from the triplestore.

alexskr commented 5 months ago

mappings count job is enabled and completes successfully according to the logs.

BioPortal has SYTOKINE ontology with the CYTO acronym.

caufieldjh commented 5 months ago

So CYTOKINE as an ontology is here: https://bioportal.bioontology.org/ontologies/CYTO but the mapping link above (https://bioportal.bioontology.org/mappings/GO?target=https%3A%2F%2Fdata.bioontology.org%2Fontologies%2FCYTO) specifies CYTO has no mappings

jvendetti commented 5 months ago

Ugh. Not enough caffeine this morning. :weary:

Indeed CYTOKINE is not an ontology acronym, but rather the name of the ontology, and the acronym is CTYO, accessible here: https://bioportal.bioontology.org/ontologies/CYTO. This doesn't appear to be an issue with the Rails application, as the relevant REST call to retrieve the mappings between GO and CTYO is returning a total count value of 59 along with an empty collection.

Screenshot 2024-01-09 at 2 33 05 PM

alexskr commented 5 months ago

mappings between CYTO and a few other ontologies similarly has a positive total count value but return empty collection:

https://bioportal.bioontology.org/mappings/CYTO?target=https%3A%2F%2Fdata.bioontology.org%2Fontologies%2FGO-EXT https://bioportal.bioontology.org/mappings/CYTO?target=https%3A%2F%2Fdata.bioontology.org%2Fontologies%2FCL

jvendetti commented 5 months ago

I took a detailed look at the log file for the last run of the cron_mapping_counts job, in particular the section that contains log output for calculation of the pairwise mapping counts. For any given ontology, it looks like this:

I, [2024-01-06T06:55:42.922878 #24967]  INFO -- : Ontology: GO. 539 mapping pair counts to record...
I, [2024-01-06T06:55:42.922926 #24967]  INFO -- : ------------------------------------------------
I, [2024-01-06T06:55:42.922949 #24967]  INFO -- : Mapping count saved for the pair [GO, PW]: 46. 538 counts remaining for GO...
I, [2024-01-06T06:55:42.922967 #24967]  INFO -- : Mapping count saved for the pair [GO, OHMI]: 5. 537 counts remaining for GO...
I, [2024-01-06T06:55:42.922981 #24967]  INFO -- : Mapping count saved for the pair [GO, CIDIT_V1_2]: 731. 536 counts remaining for GO...

# ... and so on

If you spot check any of the pairwise counts that appear in the log, BioPortal returns mapping data. For example - for the first log entry above, the relevant REST call would be https://data.bioontology.org/mappings?ontologies=GO,PW, and it returns the expected collection of mappings with 46 elements.

For the cases mentioned in previous comments where BioPortal shows a mapping count, but no mappings are materialized, the log file shows no entries for pairwise count calculation. In other words, I searched the log file for entries like this:

Mapping count saved for the pair [GO, CYTO]
Mapping count saved for the pair [CYTO, GO-EXT]
Mapping count saved for the pair [CYTO, CL]

... and came up with nothing.

CYTO is a very old ontology, last uploaded in 2015. One possible scenario is that mappings between CYTO and these other ontologies existed at some point in the past and the counts were persisted in the triplestore. It doesn't look like queries issued against the current triplestore content locate mappings between these ontologies. I don't see any logic in the codebase that would handle the case where a MappingCount object gets removed because mappings that once existed between two ontologies no longer exist.

Persisting mapping counts in the triplestore is suboptimal and was developed to workaround 4store scaling issues. Now that BioPortal runs on AllegroGraph, it would be ideal to see if we could return to using COUNT queries in our live system, rather than relying on persisted counts.