Closed mplass closed 8 years ago
Yes, this seems to be a formatting issue. The columns from the Ensembl GTFs look like this (numbered from zero)
0 chrGL000213.1 1 protein_coding 2 exon 3 138767 4 139339 5 . 6 - 7 . 8 gene_id "ENSG00000237375"; transcript_id "ENST00000327822"; exon_number "1"; gene_name "BX072566.1"; gene_biotype "protein_coding"; transcript_name "BX072566.1-201";
In terms of format, it's the last column that varies between different groups that produce GTF files. IsoSCM expects attributes to be separated by semi-colons, the attribute id to be separated from the attribute value by a space, and the attribute values to be quoted. IsoSCM will only analyze features that have an associated gene_id and transcript_id attributes.
The error message says that it reached a line in the GTF file that doesn't have the last column. I usually download the GTFs from this site http://useast.ensembl.org/info/data/ftp/index.html , is this where you downloaded the Ensembl78 GTF from?
I just downloaded from ensembl ftp site http://www.ensembl.org/info/data/ftp/index.html
On 02/17/2016 04:18 PM, shenkers wrote:
Yes, this seems to be a formatting issue. The columns from the Ensembl GTFs look like this (numbered from zero)
0 chrGL000213.1 1 protein_coding 2 exon 3 138767 4 139339 5 . 6 - 7 . 8 gene_id "ENSG00000237375"; transcript_id "ENST00000327822"; exon_number "1"; gene_name "BX072566.1"; gene_biotype "protein_coding"; transcript_name "BX072566.1-201";
In terms of format, it's the last column that varies between different groups that produce GTF files. IsoSCM expects attributes to be separated by semi-colons, the attribute id to be separated from the attribute value by a space, and the attribute values to be quoted. IsoSCM will only analyze features that have an associated gene_id and transcript_id attributes.
The error message says that it reached a line in the GTF file that doesn't have the last column. I usually download the GTFs from this site http://useast.ensembl.org/info/data/ftp/index.html , is this where you downloaded the Ensembl78 GTF from?
— Reply to this email directly or view it on GitHub https://github.com/shenkers/isoscm/issues/18#issuecomment-185249585.
Mireya Plass, PhD
Systems Biology of Gene Regulatory Elements (Nikolaus Rajewsky lab) Max Delbrück Center for Molecular Medicine Robert-Rössle Str. 10 13092 Berlin, Germany Tel: +493094064248 e-mail:mireya.plassportulas@mdc-berlin.de
Ah, I think it might be thrown off by the meta-data at the top. Do you still get the error if you delete these lines from the top of the file?
!genome-build GRCh38.p5
you are right! Now it works fine.
Thanks.
On 02/18/2016 03:37 PM, shenkers wrote:
Ah, I think it might be thrown off by the meta-data at the top. Do you still get the error if you delete these lines from the top of the file?
!genome-build GRCh38.p5
!genome-version GRCh38
!genome-date 2013-12
!genome-build-accession NCBI:GCA_000001405.20
!genebuild-last-updated 2015-10
— Reply to this email directly or view it on GitHub https://github.com/shenkers/isoscm/issues/18#issuecomment-185748541.
Mireya Plass, PhD
Systems Biology of Gene Regulatory Elements (Nikolaus Rajewsky lab) Max Delbrück Center for Molecular Medicine Robert-Rössle Str. 10 13092 Berlin, Germany Tel: +493094064248 e-mail:mireya.plassportulas@mdc-berlin.de
Great, I'll add an update so that these comment lines are ignored instead of causing an error
Hi, I'm trying to compare the results from compare to the Ensembl annotation (Ensembl78). The program works perfectly fine when I use one of the GTF files produced by IsoSCM, so it is a formatting issue of the GTF file. However, I can't find the GTF format specifications anywhere. Executed Command: java -Xmx2048m -jar IsoSCM-2.0.11.jar diff -x compare_parameters.xml -G Homo_sapiens.GRCh38.78.chr.gtf
Error: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8 at tools.ParseGTF$TranscriptIterator.next(ParseGTF.java:280) at tools.ParseGTF$TranscriptIterator.next(ParseGTF.java:1) at tools.GTFTools$AnnotationParser.(GTFTools.java:74)
at processing.DiffReference.diff(DiffReference.java:48)
at executable.IsoSCM.main(IsoSCM.java:622)
Thanks!