owlcollab / owltools

OWLTools
BSD 3-Clause "New" or "Revised" License
108 stars 33 forks source link

Unstable output for owltools --save-closure-for-chado #256

Open kimrutherford opened 6 years ago

kimrutherford commented 6 years ago

This is a follow on from https://github.com/pombase/pombase-chado/issues/678

We're using owltools --save-closure-for-chado to generate the closure we load into Chado.

@ValWood noticed that the output changes regularly for parts of the ontology that aren't changing.

We have been using these two files for testing because they are very similar: https://curation.pombase.org/kmr44/go-basic-2018-05-07.obo https://curation.pombase.org/kmr44/go-basic-2018-05-08.obo

We ran these commands:

owltools ./go-basic-2018-05-07.obo --save-closure-for-chado 2018-05-07.out
owltools ./go-basic-2018-05-08.obo --save-closure-for-chado 2018-05-08.out

Using owltools from the master branch, the output from ~go-basic-2018-05-08.obo~ go-basic-2018-05-07.obo includes these lines:

GO:1904788    RO:0002213    4    GO:0000747
GO:1904788    RO:0002211    5    GO:0000747

but the output from ~go-basic-2018-05-07.obo~ go-basic-2018-05-08.obo doesn't have any lines that mention those two GO terms. We can't see any ontology changes that could cause this difference.

With the owltools from May 2017 (version 447e4156287) that we have been using to load Canto, the output is quite different. For go-basic-2018-05-07.obo, the output is:

GO:1904788   regulates   5       GO:0000747
GO:1904788   positively_regulates        4       GO:0000747

and for go-basic-2018-05-08.obo the output is:

GO:1904788   regulates   4       GO:0000747

In case it's useful here are the four output files: https://curation.pombase.org/kmr44/owltools-chado-closure.tar.xz

(Also the recent version of owltools uses "RO:..." IDs for the relations but the older owltools uses term names like "positively_regulates". Is that a bug or should we upload our Chado loading code?)

@cmungall

kimrutherford commented 6 years ago

Sorry, I got my file names swapped. I've edited the issue.

ValWood commented 6 years ago

This probably isn't very helpful, but I think it's something to do with "regulation of regulation " terms. The terms which disappear and reappear in the cytoskeleton organization branch are in the "septation initiation signalling" area. This is an is_a child of some regulation terms (this parentage should probably be removed and captured at annotation time, but we can deal with that later).

kimrutherford commented 5 years ago

Hi.

In case it helps, we've found another example of this problem. The output of owltools --save-closure-for-chado is inconsistent for:

regulates:

Sometimes we get this line in the output, sometimes not:

GO:1905318    RO:0002211      10      GO:0000819

It changes when go-basic.obo changes, but not in a predictable way.

The command line is:

owltools go-basic.obo --save-closure-for-chado  closure.out

The version of OWLTools seems to make a difference too. We've tried with the current master and v 0.3.0. The version build from the master branch doesn't give us the line above, v0.3.0 does. This was using the go-basic.obo with data-version: releases/2019-01-27 downloaded from http://purl.obolibrary.org/obo/go/snapshot/go-basic.obo.

Is there any work-around for this problem that we could try?

kimrutherford commented 5 years ago

There's another example in this issue: https://github.com/geneontology/go-ontology/issues/17171

balhoff commented 5 years ago

@cmungall @fbastian the behavior of this method is unstable (see above) across different Java versions (8 vs 11) and also sometimes jar packaging schemes:

https://github.com/owlcollab/owltools/blob/3c7852e0fabb5d291dc33a4c8f51963af48aa534/OWLTools-Core/src/main/java/owltools/graph/OWLGraphWrapperEdges.java#L820

Curious if you have any idea why that might be. I have verified that the same number of axioms are parsed when loading the file in all situations.

balhoff commented 5 years ago

I may have this figured out—I think it's to do with the usage of synchronized. I will try to understand why on Monday.

balhoff commented 5 years ago

When I removed the synchronized usages from this code, it works as expected on Java 11. Initially I thought this made sense, because there are methods here that call each other and they are locking on the same object, which seemed like it might cause issues. But I am not very experienced with synchronized, and based on my reading it sounds like it is okay to do this, because a given thread can repeatedly obtain the same lock and shouldn't deadlock itself. So I don't know why getting rid of synchronized seems to fix the issue.

balhoff commented 4 years ago

We are working around this for PomBase by using a different tool to compute the closures: https://github.com/geneontology/go-ontology/issues/17171#issuecomment-702852561