Closed keiranmraine closed 5 years ago
Hi Keiran,
This is a scenario which I didn't forsee. In the GFF you provided, the UTRs are both explicitly and implicitly defined. By implicitly defined I mean that it's determined by taking the 'exon' and 'CDS' features together, and finding their difference in overlap. And by explicitly defined of course I mean that there's also an actual line in the GFF for each three_prime_UTR or five_prime_UTR. This is a problem because it's two ways of encoding the same information. You might want to check and see if these information are identical.
Removing either one of these will solve the problem, but I don't know if that's something you want to do. If I remove the three_prime_UTR's and five_prime_UTR's from the GFF, then the plugin still recognizes the UTRs from their implicit definitions. If I conversely remove the exons from the GFF, then the plugin goes by the explicit definitions (as removing the exon features also removes the implicitly defined UTR information contained within them).
I'm somewhat hesitant to implement a generalized logic for this type of thing, as it's dependent on making more assumptions, e.g. which information to ignore. And you know what they say about assumptions...
There was also a small bug with sorting, which is now fixed on both branches. If you do a git pull, you'll find that the error message will better reflect reality now.
Hi, I see the problem. Is it possible to provide a configuration option to ignore implicit UTR. it would make it very easy to test and have possibly leave it as a interface option so that data generation can remain 'unaltered' from the original.
It seems that the issue may be in bin/flatfile-to-json.pl
, perhaps this should be filtering this data appropriately. It displays fine in the browser.
FYI, the error is far less verbose with this sort fix. Initial dig through a few genes setting a highlight on 5'UTR and lowercase UTR shows it all lining up
Thanks for the new version, works great with GFF3 files produced by gmap on non-model organisms.
These files do typically have (mostly) overlapping CDS and exon features.
Hi,
Really happy that this now handles transcripts but can you confirm that the dialog for overlapping sub-features is working correctly?
I have a genome build of Caenorhabditis_elegans - WBcel235 using the Ensembl GFF3 found here as the source (filtered down to protein_coding only):
ftp://ftp.ensembl.org/pub/release-85/gff3/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.85.gff3.gz
Using the gene
hmr-1
as an example I get the following if I select the first transcript (WB02B9.1a.1) in the drop-down:Are you able to verify this is the correct behaviour?
Thanks, Keiran