legumeinfo / mine-issues

Report ALL issues on LIS mines here! Regardless of which mine you found it on!
2 stars 0 forks source link

handling of multi-parent exons broken in 5.1.0.4 #163

Closed adf-ncgr closed 2 months ago

adf-ncgr commented 2 months ago

We do not have many gff files that use multi-parent exons but attempting to load a 5.1.0.4 version of arachismine I got this exception:

java.lang.RuntimeException: Exon arahy.Tifrunner.gnm1.ann1.GHMM2H.1:exon:163 parent mRNA arahy.Tifrunner.gnm1.ann1.GHMM2H.1,arahy.Tifrunner.gnm1.ann1.GHMM2H.2 <has not yet been loaded. Is the GFF sorted? which derives from: lis-bio-sources/lis-annotation/src/main/java/org/intermine/bio/dataconversion/AnnotationFileConverter.java

I was puzzled as to why the annotation in question seems to have loaded with the shared exon in the current ArachisMine but it looks like there are a number of changes to the relevant file betweenm 5.1.0.3 and 5.1.0.4 so likely this was an unintended consequence. I'll probably try to sort it out (no pun intended, esp. as it really has nothing to do with sorting).

svengato commented 2 months ago

Any updates? I am seeing this in ViciaMine as well.

adf-ncgr commented 2 months ago

no updates, but can you send me exact details on the error you are seeing? I'd be surprised if it is the same issue though it's possible

svengato commented 2 months ago
Task :dbmodel:integrateMultipleSources FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':dbmodel:integrateMultipleSources'.
> java.lang.RuntimeException: Exon vicfa.Hedin2.gnm1.ann1.1g204440.1-exon-1 parent mRNA vicfa.Hedin2.gnm1.ann1.1g204440.1
<has not yet been loaded. Is the GFF sorted?
adf-ncgr commented 2 months ago

I think that is likely not the same issue, as it does not look like a multi-parent exon. My guess from looking at the annotation file:

vicfa.Hedin2.gnm1.chr1L Liftoff ncRNA   1107816 1108061 .   +   .   ID=vicfa.Hedin2.gnm1.ann1.1g204440.1;Parent=vicfa.Hedin2.gnm1.ann1.1g204440
vicfa.Hedin2.gnm1.chr1L Liftoff exon    1107816 1108061 .   +   .   ID=vicfa.Hedin2.gnm1.ann1.1g204440.1-exon-1;Parent=vicfa.Hedin2.gnm1.ann1.1g204440.1

is that the code is expecting there to be an mRNA parent for the exon whereas this one has an ncRNA parent. In other annotations we have partitioned the non-coding stuff into its own file which would at least work around the issue (assuming we also don't try to load that partitioned-off non-coding file). Someday we likely will want to load non-coding data into the mines as well but for now I think we could avoid it.

adf-ncgr commented 2 months ago

fixed in eae4e7b but note that this doesn't address the vicia issue which as I noted above is different although the error reported is the same