oschakoory / RiboTaxa

RiboTaxa: combined approaches for rRNA genes taxonomic resolution down to the species level from metagenomics data revealing novelties.
https://academic.oup.com/nargab/article/4/3/lqac070/6708509
GNU Affero General Public License v3.0
7 stars 0 forks source link

Emirge related error #6

Open RichStack opened 2 days ago

RichStack commented 2 days ago

Hi, thanks for making the RiboTaxa Pipeline. The pipeline is stalling for me on my first set of reads at the emirge step. From the stderr file. This is the error that is stopping the pipeline:

SHORTNAME = BC430_RPL Rewriting reads with indices in headers at Fri Sep 20 19:50:51 2024... DONE Rewriting reads with indexes in headers at Fri Sep 20 19:50:52 2024 [0:00:00.062254]... Number of reads (or read pairs) in input file(s): 8340 Preallocating reads and quals in memory at Fri Sep 20 19:50:52 2024... DONE Preallocating reads and quals in memory at Fri Sep 20 19:50:52 2024 [0:00:00.051172]... Performing initial mapping with command: cat /home/rjs202/RNAseq/RiboTaxa/Results/BC430_RPL/SSU_sequences/output_emirge/BC430_RPL_amplicon_16S18S_recons/emirge_tmp_reads_1.fastq | bowtie --phred33-quals -t -p 16 -n 3 -l 20 -e 300 --best --sam --chunkmbs 128 --minins 150 --maxins 350 /home/rjs202/RNAseq/rRNA_databases/RiboTaxa/bowtie_indexed_DB/SILVA_138.2_SSURef_NR99_tax_silva_bowtie_indexed -1 - -2 /home/rjs202/RNAseq/RiboTaxa/Results/BC430_RPL/SSU_sequences/output_emirge/BC430_RPL_amplicon_16S18S_recons/emirge_tmp_reads_2.fastq | samtools view -S -h -u -b -F 0x0004 - > /home/rjs202/RNAseq/RiboTaxa/Results/BC430_RPL/SSU_sequences/output_emirge/BC430_RPL_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam Beginning initialization at Fri Sep 20 19:50:53 2024... Reading bam file /home/rjs202/RNAseq/RiboTaxa/Results/BC430_RPL/SSU_sequences/output_emirge/BC430_RPL_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam at Fri Sep 20 19:50:53 2024... Culled 67 sequences in iteration -1 due to low fraction of reference sequence bases covered by >= 1 reads DONE Reading bam file /home/rjs202/RNAseq/RiboTaxa/Results/BC430_RPL/SSU_sequences/output_emirge/BC430_RPL_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam at Fri Sep 20 19:50:53 2024 [0:00:00.393227]... DONE with initialization at Fri Sep 20 19:50:53 2024... Starting iteration 0 at Fri Sep 20 19:50:53 2024... Reading bam file /home/rjs202/RNAseq/RiboTaxa/Results/BC430_RPL/SSU_sequences/output_emirge/BC430_RPL_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam at Fri Sep 20 19:50:53 2024... Culled 67 sequences in iteration 00 due to low fraction of reference sequence bases covered by >= 1 reads DONE Reading bam file /home/rjs202/RNAseq/RiboTaxa/Results/BC430_RPL/SSU_sequences/output_emirge/BC430_RPL_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam at Fri Sep 20 19:50:54 2024 [0:00:00.378175]... Calculating likelihood (77, 8340) for iteration 0 at Fri Sep 20 19:50:54 2024... Calculating Pr(N=n) for iteration 0 at Fri Sep 20 19:50:54 2024... Traceback (most recent call last): File "/home/rjs202/miniconda3/envs/RiboTaxa_py27/bin/emirge_amplicon.py", line 1580, in main() File "/home/rjs202/miniconda3/envs/RiboTaxa_py27/bin/emirge_amplicon.py", line 1572, in main do_iterations(em, max_iter = options.iterations, save_every = None) File "/home/rjs202/miniconda3/envs/RiboTaxa_py27/bin/emirge_amplicon.py", line 1184, in do_iterations em.do_iteration(em.current_bam_filename, em.current_reference_fasta_filename) File "/home/rjs202/miniconda3/envs/RiboTaxa_py27/bin/emirge_amplicon.py", line 426, in do_iteration self.calc_likelihoods() File "/home/rjs202/miniconda3/envs/RiboTaxa_py27/bin/emirge_amplicon.py", line 1123, in calc_likelihoods self.calc_probN() # (handles initial iteration differently within this method) File "/home/rjs202/miniconda3/envs/RiboTaxa_py27/bin/emirge_amplicon.py", line 1148, in calc_probN _emirge._calc_probN(self) File "_emirge_amplicon.pyx", line 363, in _emirge_amplicon._calc_probN (_emirge_amplicon.c:5671) File "pysam/libcfaidx.pyx", line 303, in pysam.libcfaidx.FastaFile.fetch KeyError: "sequence 'MBFK01000143.1.1414' not present"

I've done a search to see if others have encountered this problem before and there is one unanswered question on the emirge google group where someone apparently has the same error. Any ideas how to proceed? Thanks, R

oschakoory commented 12 hours ago

Hi, can you give some details about the nature of your data (paired or singled, reads length, max length) as well as the parameters that you are using for EMIRGE (from the config file)?

Thank you. OC

RichStack commented 9 hours ago

Hi there, thanks for getting back to me. I actually managed to solve this - there was an issue with the database indexing (I used the more recent SILVA 138.2 files for this. Sortmerna db creation worked well, but at the emirge step, something went wrong with the filtering command. I can't tell you what happened unfortunately. I just ran that part of the code separately and checked that the identifiers now matched (they previously didn't). Then I had to build the bowtie database subsequently. It all works well now thank you. Sorry I couldn't be more specific about the error though - I have no idea why the code didn't work the first time around. I'll close the issue now - but thanks very much for your attention.