owlcs / owlapi

OWL API main repository
826 stars 314 forks source link

Major bug in punning detection, concealing major performance problem in filterAxioms #363

Closed sesuncedu closed 9 years ago

sesuncedu commented 9 years ago

Whilst looking in to the issue of why some parts of ontology rendering were non-deterministic, I discovered a major problem in the faster punning detection code

public Set<IRI> getPunnedIRIs(Imports includeImportsClosure) {
        Set<IRI> punned = new HashSet<>();
        Set<IRI> test = new HashSet<>();
  [includeImportsClosure case omitted]
       for (OWLEntity e : getClassesInSignature(EXCLUDED)) {
            punned.add(e.getIRI());
        }
        for (OWLEntity e : getDataPropertiesInSignature(EXCLUDED)) {
            if (test.add(e.getIRI())) {
                punned.add(e.getIRI());
            }
        }
        for (OWLEntity e : getObjectPropertiesInSignature(EXCLUDED)) {
            if (test.add(e.getIRI())) {
                punned.add(e.getIRI());
            }
        }
        [etc.]
        return punned;

This defines defines a punned IRI as one that names a class, a data property, an object property that is not also a data property, etc.

In some ways this is a good thing, as correcting the error causes saving the Gene Ontology as RDF/XML to take half an hour, almost all of the time being in MapPointer.filterAxioms.

Investigation continues.

Also, Annotation Assertions are not sorted. Also, predicates in triples are not totally ordered.

sesuncedu commented 9 years ago

Ah... this would be the filter Axioms by doing a linear search through every single annotation property assertion axiom instead of using the index approach.

sesuncedu commented 9 years ago

Internals::filterAxioms is about to get a minimalist hack.

sesuncedu commented 9 years ago

After, 3.5s. That's a bit better. Now to see what broke

ignazio1977 commented 9 years ago

Rings a bell, I think I fixed something similar a month or two ago. Core Developer Misusing Own API Badge Of Shame: ASSIGNED :-(

ignazio1977 commented 9 years ago

I've changed a number of calls to Filters.annotations that I had missed first time around. Really that filter should go. I won't touch the area at the moment, if you're making changes as well.

sesuncedu commented 9 years ago

Change made. I didn't pull request because I was feeling too tired, and I hit a weird writer bug in a non test suite run against some modded translated obo - FSS files with imports that didn't fire on related files without imports, and the was too tired to test.

Also: A pun? I mean a palindrome. A palindrome of Bolton would be Notlob.

ignazio1977 commented 9 years ago

I've had a peek on your repo, everything looks fine to me. Big boner, the punning test that does the opposite check <shame/>

sesuncedu commented 9 years ago

I'm going to see what a merge looks like .

sesuncedu commented 9 years ago

And I see why the RDF/XML rendering is exploding:

In RDFRenderBase, we create a graph for all annotations for this entity, from anywhere in the imports closure.

    private boolean createGraph(@Nonnull OWLEntity entity,
            Collection<IRI> illegalPuns) {
            ...
            for (OWLOntology o : ontology.getImportsClosure()) {
                axioms.addAll(o.getAnnotationAssertionAxioms(entity.getIRI()));
            }
           ...

In RDFXMLNamespaceManager we get all the annotation properties in the axiom signature that we are going to define namespaces for, but with imports EXCLUDED .

    protected Set<OWLEntity> getEntitiesThatRequireNamespaces() {
        ...
        entities.addAll(getOntology().getAnnotationPropertiesInSignature(
                EXCLUDED));
        ...
    }

And in XMLWriterImpl, we explode if there is no namespace defined (or for absolutely no reason, if there is a namespace defined whose length is equal to the prefix part of the iri)

    public void writeStartElement(IRI name) throws IOException {
        String qName = xmlWriterNamespaceManager.getQName(name);
        if (qName.length() == name.length()) {
            // Could not generate a valid QName, therefore, we cannot
            // write valid XML - just throw an exception!
            throw new IllegalElementNameException(name.toString());
        }

Cause it's all about the base, the base, the base no prefix.

sesuncedu commented 9 years ago

This will move to a new issue and fix after I've had my supper.

ignazio1977 commented 9 years ago

Fixed last month.