owlcollab / owltools

OWLTools
BSD 3-Clause "New" or "Revised" License
108 stars 33 forks source link

Ensure owltools produces correct aspect when generating predicted GAFs #237

Open cmungall opened 6 years ago

cmungall commented 6 years ago

See https://github.com/geneontology/go-site/issues/524

What appears to be happening is this. OWLTools is making an assumption that the 'namespace' tag is present, and uses this to make the aspect.

In the context of the pipeline we make a temporary version of the GO imports closure that has this stripped out. We should change this, but owltools should not silently continue if this is not present.

cmungall commented 6 years ago

Example

In this run https://build.berkeleybop.org/view/GAF/job/gaf-check-pombase/135/artifact/gene_association.pombase.inf.gaf/*view*/

PomBase SPAC1006.07 tif1 GO:0006412 PMID:10462529 IMP P translation initiation factor eIF4A tif1 protein taxon:4896 20170518 GOC

which is correct

now gives

PomBase SPAC1006.07 tif1 GO:0006412 PMID:10462529 IMP C translation initiation factor eIF4A tif1 protein taxon:4896 20170518 GOC

yy20716 commented 6 years ago

From my observation, it seems that your speculation is correct. When I ran owltools with the following option,

target/go-gaf.owl  --gaf target/groups/pombase/pombase.gaf --createReport --gaf-report-file target/groups/pombase/pombase-owltools-check.txt --gaf-report-summary-file target/groups/pombase/pombase-summary.txt --gaf-prediction-file target/groups/pombase/pombase-prediction.gaf --gaf-prediction-report-file target/groups/pombase/pombase-prediction-report.txt --gaf-validation-unsatisfiable-module target/groups/pombase/pombase-incoherent.owl --experimental-gaf-prediction-file target/groups/pombase/pombase-prediction-experimental.gaf --experimental-gaf-prediction-report-file target/groups/pombase/pombase-prediction-experimental-report.txt  --gaf-run-checks

The owltools internally calls Init() of BasicAnnotationPropagator. In this method, there's a code that creates an aspectMap.

aspectMap = createDefaultAspectMap(graph);

What happens in this method is that map right now is first to fill the map with <null, "F">, then <null, "P">, and then <null, "C">. This happens because getGoSubOntology(cc, graph) returns the null all the time.

protected Map<String, String> createDefaultAspectMap(OWLGraphWrapper graph) {
    Map<String, String> map = new HashMap<String, String>();

    OWLClass mf = graph.getOWLClassByIdentifier("GO:0003674"); // molecular_function
    if (mf != null) {
        map.put(getGoSubOntology(mf, graph), "F");
    }

    OWLClass bp = graph.getOWLClassByIdentifier("GO:0008150"); // biological_process
    if (bp != null) {
        map.put(getGoSubOntology(bp, graph), "P");
    }

    OWLClass cc = graph.getOWLClassByIdentifier("GO:0005575"); // cellular_component
    if (cc != null) {
        map.put(getGoSubOntology(cc, graph), "C");
    }

    if (map.isEmpty()) {
        // only fail if there are mappings
        // the test case uses a custom ontology, which has no cc branch
        throw new RuntimeException("Could not create any valid aspect mappings. Is the correct ontology (GO) loaded?");
    }

    return Collections.unmodifiableMap(map);
}

Later, this aspectMap is used in predictForBioEntity of BasicAnnotationPropagator and getSubOntology(linkedClass) still returns null, and the aspectMap has only one key-value set, <null, "C">, which is why all entries in the prediction gaf files are filled up with "C".

for (OWLClass linkedClass : linkedClasses) {
    String aspect = aspectMap.get(getSubOntology(linkedClass));
    Prediction p = createPrediction(linkedClass, aspect, cid, ann);
    p.setReason(createReason(linkedClass, aspect, cid, evidenceCls, g));
    allPredictions.add(linkedClass, p);
}

So what happens in getSubOntology? It appears that getSubOntology internally calls getNamespace.

public String getNamespace(OWLObject c) {
    OWLAnnotationProperty lap = getAnnotationProperty(OboFormatTag.TAG_NAMESPACE.getTag());
    return getAnnotationValue(c, lap);
}

When OWLObject c is http://purl.obolibrary.org/obo/GO_0006412, the lap is http://www.geneontology.org/formats/oboInOwl#hasOBONamespace and getAnnotationValue with these parameters returns null, thus getSubOntology gets null. At this point I am not sure what is supposed to be happening but I see that GO_0006412 is NOT described using this OBONamespace in the current go-gaf.owl.

Anyway, it may be sufficient to put new conditions that check null keys in createDefaultAspectMap and throw exceptions but please let me know if you have any other suggestions. Thank you.