timbitz / Whippet.jl

Lightweight and Fast; RNA-seq quantification at the event-level
MIT License
103 stars 21 forks source link

whippet index failed, link to download the hg38 GTF2.2 on this website does not work #149

Open santataRU opened 5 days ago

santataRU commented 5 days ago

Dear All,

I encountered an error while running whippet-index.jl. The error appears to be due to my GTF file not being in the 2.2 format. Could you please let me know where to download the correct GTF format (GTF2.2) for hg38? The link provided on this website does not work. Is there an updated link available?

Thank you,


PS: error messages: (base) xiaolei@Xiaos-Laptop bin % julia ./whippet-index.jl --fasta /Users/xiaolei/Whippet.jl/anno/hg38.fa.gz --gtf /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation_sorted.gtf.gz Whippet v1.6.2 loading... Activating environment at ~/Whippet.jl/Project.toml 5.640663 seconds. Loading GTF file: /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation_sorted.gtf.gz

┌ Warning: Using low quality Transcript Support Levels (TSL 3+) in your GTF file is not recommended! │ For more information on TSL, see: http://www.ensembl.org/Help/Glossary?id=492 │ │ If you would like Whippet to ignore these when building its index, use --suppress-low-tsl option! │ └ @ Whippet ~/Whippet.jl/src/refset.jl:159 ERROR: LoadError: ERROR: GTF file is not in valid GTF2.2 format!

ERROR: Annotation entries for 'transcript_id' ENST00000430923.7 has already been fully processed and closed. HINT: All GTF lines with the same 'transcript_id' must be adjacent in the GTF file and referring to the same transcript and gene! Stacktrace: [1] error(s::String) @ Base ./error.jl:33 [2] load_gtf(fh::BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}; txbool::Bool, suppress::Bool, usebam::Bool, bamreader::Nullable{XAM.BAM.Reader}, bamreads::Int64, bamoneknown::Bool) @ Whippet ~/Whippet.jl/src/refset.jl:165 [3] macro expansion @ ~/Whippet.jl/src/timer.jl:5 [inlined] [4] main() @ Main ~/Whippet.jl/bin/whippet-index.jl:91 [5] top-level scope @ ~/Whippet.jl/src/timer.jl:5 in expression starting at /Users/xiaolei/Whippet.jl/bin/whippet-index.jl:108

santataRU commented 5 days ago

I just found the cause of the problem. I used a sorted hg38 Ensembl GTF file for the IGV browser instead of the original unsorted GTF file. In the sorted GTF file, all entries for a transcript are NOT in a continuous block, causing problems in index building.

After I ran whippet-index with the original unsorted GTF, I got a warning message: "Using low quality Transcript Support Levels (TSL 3+) in your GTF file is not recommended!" and "If you would like Whippet to ignore these when building its index, use the --suppress-low-tsl option!"

Is it better to use the --suppress-low-tsl option?



PS: warning message with unsorted hg38 GTF file.

(base) xiaolei@Xiaos-Laptop bin %julia ./whippet-index.jl --fasta /Users/xiaolei/Whippet.jl/anno/hg38.fa.gz --gtf /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation.gtf.gz Whippet v1.6.2 loading... Activating environment at ~/Whippet.jl/Project.toml 5.724591 seconds. Loading GTF file: /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation.gtf.gz

Warning: Using low quality Transcript Support Levels (TSL 3+) in your GTF file is not recommended! │ For more information on TSL, see: http://www.ensembl.org/Help/Glossary?id=492 │ │ If you would like Whippet to ignore these when building its index, use --suppress-low-tsl option! │ └ @ Whippet ~/Whippet.jl/src/refset.jl:159 Loaded 643514 annotated splice-sites from GTF file.. 122.786011 seconds (209.44 M allocations: 20.170 GiB, 2.98% gc time) Indexing transcriptome... Decompressing and Indexing /Users/xiaolei/Whippet.jl/anno/hg38.fa.gz... Building Splice Graphs for chr1.. 8.388163 seconds (44.79 M allocations: 22.590 GiB, 5.43% gc time)

Dear All,

I encountered an error while running whippet-index.jl. The error appears to be due to my GTF file not being in the 2.2 format. Could you please let me know where to download the correct GTF format (GTF2.2) for hg38? The link provided on this website does not work. Is there an updated link available?

Thank you,


PS: error messages: (base) xiaolei@Xiaos-Laptop bin % julia ./whippet-index.jl --fasta /Users/xiaolei/Whippet.jl/anno/hg38.fa.gz --gtf /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation_sorted.gtf.gz Whippet v1.6.2 loading... Activating environment at ~/Whippet.jl/Project.toml 5.640663 seconds. Loading GTF file: /Users/xiaolei/Whippet.jl/anno/gencode.v45.annotation_sorted.gtf.gz

┌ Warning: Using low quality Transcript Support Levels (TSL 3+) in your GTF file is not recommended! │ For more information on TSL, see: http://www.ensembl.org/Help/Glossary?id=492 │ │ If you would like Whippet to ignore these when building its index, use --suppress-low-tsl option! │ └ @ Whippet ~/Whippet.jl/src/refset.jl:159 ERROR: LoadError: ERROR: GTF file is not in valid GTF2.2 format!

ERROR: Annotation entries for 'transcript_id' ENST00000430923.7 has already been fully processed and closed. HINT: All GTF lines with the same 'transcript_id' must be adjacent in the GTF file and referring to the same transcript and gene! Stacktrace: [1] error(s::String) @ Base ./error.jl:33 [2] load_gtf(fh::BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}; txbool::Bool, suppress::Bool, usebam::Bool, bamreader::Nullable{XAM.BAM.Reader}, bamreads::Int64, bamoneknown::Bool) @ Whippet ~/Whippet.jl/src/refset.jl:165 [3] macro expansion @ ~/Whippet.jl/src/timer.jl:5 [inlined] [4] main() @ Main ~/Whippet.jl/bin/whippet-index.jl:91 [5] top-level scope @ ~/Whippet.jl/src/timer.jl:5 in expression starting at /Users/xiaolei/Whippet.jl/bin/whippet-index.jl:108