Open scottcain opened 5 years ago
Hi Scott.
This is due to the presence of overlaps between subfeatures in the underlying annotation. Overlapping subfeatures are fundamentally problematic for this program, and that's unlikely to change. I couldn't find a way to consistently and correctly highlight/hide/etc overlapping subfeatures without a bunch of extra logic and headaches regarding the differing possibilities inherent with nested boundaries.
Knowing this, I did two things for when overlapping subfeats are detected:
In the common case of a CDS/UTR overlapping with an exon, I included some logic to check that the boundaries make sense and then basically exclude the exon data (it is redundant in that case). Once this is completed, the overlaps should no longer exist.
If overlaps still persisted, I (perhaps foolishly?) still allowed FeatureSequence to continue with an older (slower and not recommended) implementation of the viewer that ignores the overlaps and instead warns the user that problems are likely. When you see the warning dialog which shows up in the referenced videos, this is what the end of the full text reads:
Overlapping subfeatures will cause problems in viewing their boundaries. This may also cause the Feature Sequence Viewer to respond slowly.
Is it reasonable for you to modify the annotation data? If so, perhaps you could provide a relevant portion of your annotation file for the track in question and we could figure it out? I think that the automatic CDS/UTR/exon parsing is failing, and that's why you're seeing this.
Hi @tsaari88 ,
Thanks for looking at this. Here is a sample of GFF that is causing problems:
https://gist.github.com/scottcain/8945cfa9ca820b5d287dd0c428785264
and the corresponding JBrowse track for the B0304.1c.1 transcript:
A particular oddity is the warning message in the dialog between five_prime_UTR_4 and exon_5, since there aren't 4 5' UTR lines in this GFF.
Hi @twsaari
WormBase folks have again been pointing out problems with the FeatureSequence plugin. Is this something you have time to look at? If so, please use these transcripts:
Unfortunately, I can't link to the bug report because we've taken it private, but this is the most recent comment:
It seems there are still some errors with the "FeatureSequence Viewer" tool in JBrowse. The latest example I came across while looking into a help desk ticket about transcript C53D5.6.2. Here's a summary of the behavior of the tool with regards to this transcript:
Track: Curated Genes Action: right-click on imb-3 gene, click "View Sequence", select transcript C53D5.6.2, and view sequence Issues:
"CDSs" button: OK "UTRs" button: Highlights the portion of the 3'UTR contained on the next-to-last exon, but doesn't include 3'UTR sequence on the final exon; misses the 5'UTR entirely "Exons" button: Only highlights the first and last exon, none of the middle exons "Five_prime_UTRs" button: OK "Introns" button: OK "Others" button: not applicable? "Three_prime_UTRs" button: OK "Upstream" button: OK "Downstream" button: OK Same issues appear on the "Curated genes (protein coding)" track.
Checking other genes in JBrowse, it is clear that the "UTRs" button and the "Exons" button consistently have problems. For one gene/transcript (marc-4/C53D5.2.1), the "Exon" button doesn't appear at all (yes, it is spliced).
So, it seems that the "Exon" button, if it appears, consistently highlights only a small portion of the exon sequence for the transcript. The "UTR" button usually misses most or all of the 5'UTR and sometimes the 3'UTR, but always misses some UTR sequence, particularly when UTRs contain introns.
Hello,
I believe I have been having the same issue with FeatureSequence viewer, in that consistently only the first exon is highlighted in the viewer. All of the CDS sequences are highlighted properly from what I can see. Yes this is a case of a GFF with overlapping CDS/exon sequences. I would add though that this gene model structure isn't really going away as I've noticed that when NCBI releases new gene annotations for different genomes they always tend to use overlapping CDS/exon subfeatures, most likely to indicate UTRs at the beginning and end of the transcripts for the region that they do not overlap.
I just wanted to check if this was updated or fixed yet, I understand if it hasn't as it sounds like there's a lot to deal with in these scenarios in terms of your logic. If it hasn't been resolved, is it true from your earlier comment that when there are overlapping subfeatures, all exons are redundant and excluded from highlighting/lowercase features?
What do you suggest in this scenario for users, should they only focus on CDS segments, when both are present?
Thanks,
Vaneet
Hi, FeatureSequence only highlights 2 exons but there are 11.
NbLab350C17 scallop gene 92773802 92784469 . + . ID=NbL17g15920
NbLab350C17 scallop mRNA 92773802 92784469 . + . ID=NbL17g15920.1;Parent=NbL17g15920;RPKM=3.7278;Note=uncharacterized LOC109243108 transcript variant X4 XP_019265549.1;evalue=0.00;cov=303.4095
NbLab350C17 scallop exon 92773802 92773977 . + . ID=NbL17g15920.1.exon.0;Parent=NbL17g15920.1;exon=1;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92774154 92774303 . + . ID=NbL17g15920.1.exon.1;Parent=NbL17g15920.1;exon=2;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92775176 92775522 . + . ID=NbL17g15920.1.exon.2;Parent=NbL17g15920.1;exon=3;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop CDS 92775427 92775522 . + . ID=NbL17g15920.1.CDS.0;Parent=NbL17g15920.1;exon=3;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92776156 92776314 . + . ID=NbL17g15920.1.exon.3;Parent=NbL17g15920.1;exon=4;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop CDS 92776156 92776314 . + . ID=NbL17g15920.1.CDS.1;Parent=NbL17g15920.1;exon=4;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92777391 92777471 . + . ID=NbL17g15920.1.exon.4;Parent=NbL17g15920.1;exon=5;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop CDS 92777391 92777471 . + . ID=NbL17g15920.1.CDS.2;Parent=NbL17g15920.1;exon=5;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92781380 92781472 . + . ID=NbL17g15920.1.exon.5;Parent=NbL17g15920.1;exon=6;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop CDS 92781380 92781472 . + . ID=NbL17g15920.1.CDS.3;Parent=NbL17g15920.1;exon=6;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92781705 92781797 . + . ID=NbL17g15920.1.exon.6;Parent=NbL17g15920.1;exon=7;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop CDS 92781705 92781797 . + . ID=NbL17g15920.1.CDS.4;Parent=NbL17g15920.1;exon=7;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92781877 92782020 . + . ID=NbL17g15920.1.exon.7;Parent=NbL17g15920.1;exon=8;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop CDS 92781877 92782020 . + . ID=NbL17g15920.1.CDS.5;Parent=NbL17g15920.1;exon=8;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92783207 92783314 . + . ID=NbL17g15920.1.exon.8;Parent=NbL17g15920.1;exon=9;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop CDS 92783207 92783314 . + . ID=NbL17g15920.1.CDS.6;Parent=NbL17g15920.1;exon=9;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92783491 92783542 . + . ID=NbL17g15920.1.exon.9;Parent=NbL17g15920.1;exon=10;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop CDS 92783491 92783542 . + . ID=NbL17g15920.1.CDS.7;Parent=NbL17g15920.1;exon=10;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop CDS 92783630 92783661 . + . ID=NbL17g15920.1.CDS.8;Parent=NbL17g15920.1;exon=11;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
NbLab350C17 scallop exon 92783630 92784469 . + . ID=NbL17g15920.1.exon.10;Parent=NbL17g15920.1;exon=11;gene_id=gene.134870.0;transcript_id=gene.134870.0.7
Has anyone found an alternative tool?
Hi @tsaari88 ,
I have a case at WormBase where FeatureSequence doesn't seem to get the subparts to highlight correctly. Could you take a look at our bug https://github.com/WormBase/website/issues/6819 and let me know if you have any thoughts?
Thanks, Scott