Open olgabot opened 11 years ago
Hi Olga, Thank you for the report and sorry for the quite delayed replied. The short answer is that this is caused because the older version of the annotations (generated in Wang et. al. (2008) - what is labeled on MISO site as 'Ver 1' of the annotations) were made for older genomes, like mm9 and hg18. These were converted to hg19 by liftOver, but their old names (in the IDs) were kept. I completely agree that this is very confusing; so I'll fix it and upload a new version of the annotations. This bug should not occur in annotations that were made using the hg19 genome to start with, labeled as 'Ver 2' on the MISO website, since these did not involve liftOver.
Best, --Yarden
By the way, there's a notice describing this on the annotations page:
Hello Yarden et al, This is more of a feature request than a bug.
I'm trying to understand the ID scheme of the provided gff3 files. It says in the documentation as an example, that the ID of one SE entry was "arbitrarily" chosen to be the coordinates of the 5' upstream exon, the SE itself, and its 3' downstream exon. However, I don't see this in the current SE.hg19.gff3 file:
For example, the first exon is on the negative strand and has a
start
andstop
of16854
and17055
. However, its ID ischr1:7778:7924:-@chr1:7096:7605:-@chr1:6717:6918:-.A.dn
, which doesn't include either of those numbers!This has been especially confusing when attempting to interpret MISO output, and going to the middle chromosome location in the ID, and finding no reads there. But the location specified by the
start
andstop
columns in the original.gff3
file are correct, (which is comforting) but it's kind of a pain to have to grep for this arbitrary ID every time.Is it possible for these
.gff3
files to be updated such that the ID matches the chromosome location?FWIW, this seems to also be an issue in
A3SS.hg19.gff3
:A5SS.hg19.gff3
:MXE.hg19.gff3
:RI.hg19.gff3
:And
TandemUTR.hg19.gff3
: