Open DarioS opened 10 years ago
Hi DarioS,
Thanks for reporting this. This error seems very similar to this previous (closed) issue. In that case the issue was caused by a combination of duplicate transcripts in the input file, and was resolved by removing the duplicates. As I commented in that ticket; crashing is obviously not the desired behavior, and the suggested behavior in future versions is to ignore all but one of the duplicates, report their existence to the user, and continue building the index.
After encountering the same issue, I looked into this in depth and have worked out why this happens.
e.g. If you use the ensembl GTF (Homo_sapiens.GRCh38.77.gtf
) and the reference genome (Homo_sapiens.GRCh38.77.dna.toplevel.fa
) (downloadable from here)
and you extract your transcripts using gffread
thus
gffread -w Homo_sapiens.GRCh38.77.transcripts.fa -g Homo_sapiens.GRCh38.77.dna.toplevel.fa Homo_sapiens.GRCh38.77.gtf
you end up with duplicate transcripts because of extraneous annotations (it appears that all the ones that are duplicates in the outputs are those with selenocysteine annotations). You can get around this by stripping the GTF file down to just the CDS and exons first,
awk '($3 == "exon" || $3 == "CDS")' Homo_sapiens.GRCh38.77.gtf > Homo_sapiens.GRCh38.77.filtered.gtf
then extract your transcripts
gffread -w Homo_sapiens.GRCh38.77.transcripts.fa -g Homo_sapiens.GRCh38.77.dna.toplevel.fa Homo_sapiens.GRCh38.77.filtered.gtf
then run sailfish
sailfish index -m Homo_sapiens.GRCh38.77.filtered.gtf -t Homo_sapiens.GRCh38.77.transcripts.fa -o Sailfish_Index_Human_GRCh38.77 -k 20
tada :tada: !
(p.s. this is also a handy tutorial for those wanting to build a sailfish index for Ensembl GRCh38 :smile: )
I can't tell if this is a problem with gffread
or if this is to be expected - I'm not an expect on the GTF format, but hopefully this information is helpful to others.
Hi @jpiper,
Thanks for the detailed response here! I'm going to see if the same issue affects Salmon. It also probably makes sense (as hinted above) to flag, remove and notify the user of duplicates. However, for the time being, would you mind if used your example in our documentation (with attribution, of course)?
Hey Rob, no problem whatsoever - happy to help!
The program runs some steps, then an error occurs.
after relabling, there are 140259 eq classes Building the k-mer equiv. class <=> transcript mappings 0% [> ] ETA > 1 week
* FATAL TRIGGER RECEIVED *** Received fatal signal: SIGSEGV(11) PID: 31416 stack dump [1] /lib/x86_64-linux-gnu/libpthread.so.0+0xf030 [0x7fa38c39e030] stack dump [2] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libsailfish_core.so+0xfbf8e [0x7fa38da5df8e] stack dump [3] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libsailfish_core.so+0xfc6cd [0x7fa38da5e6cd] stack dump [4] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x2203a [0x7fa38bf6803a] stack dump [5] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x1dd96 [0x7fa38bf63d96] stack dump [6] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x1d45b [0x7fa38bf6345b] stack dump [7] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x1ae5f [0x7fa38bf60e5f] stack dump [8] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libtbb.so.2+0x1b059 [0x7fa38bf61059] stack dump [9] /lib/x86_64-linux-gnu/libpthread.so.0+0x6b50 [0x7fa38c395b50] stack dump [10] /lib/x86_64-linux-gnu/libc.so.6clone+0x6d [0x7fa38b2af0ed]
***\ RETHROWING SIGNAL SIGSEGV(11)
g2log exiting after receiving fatal event
This problem does not happen with the sample data transcripts file.
My command was $ /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/bin/sailfish index -k 30 -t data/sequence/hg19/hg19genes.fasta -o hg19
The transcripts file I obtained from UCSC Genome Browser looks like $ head data/sequence/hg19/hg19genes.fasta
If the code is rerun, the program thinks it completed successfully.
Checking that jellyfish hash is up to date All index files seem up-to-date.
I ran quant next, but it seems that the indexes weren't completely created.
Creating optimizer . . .done optimizing using iterative optimization [1000] iterations reading Kmer equivalence classes updating Kmer group counts updating transcript map
* FATAL TRIGGER RECEIVED *** Received fatal signal: SIGSEGV(11) PID: 32239 stack dump [1] /lib/x86_64-linux-gnu/libpthread.so.0+0xf030 [0x7fcaf0aeb030] stack dump [2] /verona/nobackup/dario/Sailfish-0.6.3-Linuxx86-64/bin/sailfish : CollapsedIterativeOptimizer::prepareCollapsedMaps (std::string const&, bool)+0x223 [0x460393]
stack dump [3] /verona/nobackup/dario/Sailfish-0.6.3-Linuxx86-64/bin/sailfish : CollapsedIterativeOptimizer::initialize (std::string const&, std::string const&, std::string const&, bool)+0xb2 [0x460932]
stack dump [4] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/bin/sailfish : CollapsedIterativeOptimizer::optimize(std::string const&, std::string const&, std::string const&, unsigned long, double, double)+0x4a [0x46838a]
stack dump [5] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/bin/sailfish : runIterativeOptimizer(int, char**)+0x17a1 [0x44ed21]
stack dump [6] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libsailfish_core.so+0xdfe32 [0x7fcaf218ee32]
stack dump [7] /verona/nobackup/dario/Sailfish-0.6.3-Linux_x86-64/lib/libstdc++.so.6+0xb4210 [0x7fcaf01f7210]
stack dump [8] /lib/x86_64-linux-gnu/libpthread.so.0+0x6b50 [0x7fcaf0ae2b50]
stack dump [9] /lib/x86_64-linux-gnu/libc.so.6clone+0x6d [0x7fcaef9fc0ed]
***\ RETHROWING SIGNAL SIGSEGV(11)