pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
650 stars 171 forks source link

Kallisto occasionally freezing at BAM indexing #177

Open ulah opened 6 years ago

ulah commented 6 years ago

Hi there, I'm currently evaluating whether we could use kallisto/pizzly for fusion gene prediction. However, for some samples I realized that kallisto is somehow freezing at BAM indexing (waited for >12h). Unfortunately, this is no reproducible behavior, meaning that a repeated execution with the same command (and available ressources) may finish w/o problems. Any ideas why this happens?

If it helps, here my command line:

kallistoIdx="/.../Ensembl_GRCh38_v86/kallistoIdx/v0.44.0/GRCh38_cDNA_all_k29"
genomeGtf="/.../Ensembl_GRCh38_v86/Homo_sapiens.GRCh38.86.gtf"
genomeSizes="/.../Ensembl_GRCh38_v86/Homo_sapiens.GRCh38.dna.primary_assembly.chrSizes.txt"

kallisto quant --threads 12 --genomebam --gtf "$genomeGtf" --chromosomes "$genomeSizes" --index "$kallistoIdx" --fusion --output-dir "$outDirK" "$forReads" "$revReads"

And here the output from stdout:

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 29
[index] number of targets: 178,136
[index] number of k-mers: 105,160,906
[index] number of equivalence classes: 739,685
Warning: 34964 transcripts were defined in GTF file, but not in the index
[quant] running in paired-end mode
[quant] will process pair 1: /xxx_R1.fastq.gz
                             /xxx_R2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 124,190,815 reads, 108,875,156 reads pseudoaligned
[quant] estimated average fragment length: 175.248
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1,535 rounds
[  bam] writing pseudoalignments to BAM format .. done
[  bam] sorting BAM files .. done
[  bam] indexing BAM file .. 
pabloiturralde commented 5 years ago

Hi, I get this exact same warning:

Warning: 34964 transcripts were defined in GTF file, but not in the index

When I run: kallisto quant -i ~/kallisto_index/bdgp6.93_kallisto_index.fa -o /volumes/piturral/fastq/learning/kallisto_output/C02plusO -b 100 --genomebam --gtf ~/gtf/Drosophila_melanogaster.BDGP6.93.gtf /volumes/piturral/fastq/learning/untrimmed/C02plusO_S4_L001_R1_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/C02plusO_S4_L001_R2_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/C02plusO_S4_L002_R1_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/C02plusO_S4_L002_R2_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/C02plusO_S4_L003_R1_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/C02plusO_S4_L003_R2_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/C02plusO_S4_L004_R1_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/C02plusO_S4_L004_R2_001.fastq.gz

And I also get these results: quant] fragment length distribution will be estimated from the data [index] k-mer length: 31 [index] number of targets: 3,739 [index] number of k-mers: 173,304,639 [index] number of equivalence classes: 16,422 Warning: 34767 transcripts were defined in GTF file, but not in the index [quant] running in paired-end mode [quant] will process pair 1: /volumes/piturral/fastq/learning/untrimmed/J02O_S1_L001_R1_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/J02O_S1_L001_R2_001.fastq.gz [quant] will process pair 2: /volumes/piturral/fastq/learning/untrimmed/J02O_S1_L002_R1_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/J02O_S1_L002_R2_001.fastq.gz [quant] will process pair 3: /volumes/piturral/fastq/learning/untrimmed/J02O_S1_L003_R1_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/J02O_S1_L003_R2_001.fastq.gz [quant] will process pair 4: /volumes/piturral/fastq/learning/untrimmed/J02O_S1_L004_R1_001.fastq.gz /volumes/piturral/fastq/learning/untrimmed/J02O_S1_L004_R2_001.fastq.gz [quant] finding pseudoalignments for the reads ... done [quant] processed 44,717,805 reads, 40,364,816 reads pseudoaligned [quant] estimated average fragment length: 187.16 [ em] quantifying the abundances ... done [ em] the Expectation-Maximization algorithm ran for 132 rounds [bstrp] running EM for the bootstrap: 100 [ bam] writing pseudoalignments to BAM format .. done [ bam] sorting BAM files .. done [ bam] indexing BAM file .. done

Can someone please explain what does the warning mean?

Thanks! P.