khyox / recentrifuge

Recentrifuge: robust comparative analysis and contamination removal for metagenomics
http://www.recentrifuge.org
Other
86 stars 7 forks source link

ZeroDivisionError #34

Closed jagos01 closed 3 years ago

jagos01 commented 3 years ago

Bug report

Hello @khyox, This is the same error as post #22. I believe I am using the correct centrifuge file.

Bug summary

ZeroDivisionError: division by zero when using rextract with centrifuge output file.

Running Centrifuge/Recentrifuge

Command line - centrifuge

>centrifuge --verbose -p 22 -x /home/Data1/Centrifuge_index/centrifuge-abv-univec-bp -U /home/Desktop/seq_analysis/results_162853/combined/barcode_07.fastq > /home/Desktop/seq_analysis/results_162853/centrifuge/barcode_07/barcode_07.out

Command line - rextract

>rextract -f /home/Desktop/seq_analysis/results_162853/centrifuge/barcode_07/barcode_07.out -n /home/Data1/Centrifuge_Index/taxonomy -i 621 -q /home/Desktop/seq_analysis/results_162853/combined/barcode_07.fastq

Data

Actual outcome

Centrifuge output file:

head -n 20 /home/Desktop/seq_analysis/results_162853/centrifuge/barcode_07/barcode_07.out
Input bt2 file: "/home/Data1/Centrifuge_Index/centrifuge-abv-univec-bp"
Query inputs (DNA, FASTQ):
  /home/Desktop/seq_analysis/results_162853/combined/barcode_07.fastq
Quality inputs:
Output file: ""
Local endianness: little
Sanity checking: disabled
Assertions: disabled
Trying /home/Data1/Centrifuge_Index/centrifuge-abv-univec-bp
readID  seqID   taxID   score   2ndBestScore    hitLength   queryLength numMatches
d03898bd-1221-486a-a664-7b1d07c8e9c0    CP049598.1  1406    81  81  24  530 2
d03898bd-1221-486a-a664-7b1d07c8e9c0    CP049783.1  1406    81  81  24  530 2
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d    genus   561 121 121 26  686 5
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d    species 32630   121 121 26  686 5
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d    species 703 121 121 26  686 5
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d    genus   186777  121 121 26  686 5
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d    genus   620 121 121 26  686 5
14f82998-810e-49ee-bd85-2006c73d5f03    species 1423    4225    0   121 849 1
2f6f45b8-b733-4fa7-9e5a-1de37b12f042    CP057475.1  562 289 0   32  2301    1
443fd5a8-f51e-4152-a3cb-ca370335bdb2    CP013187.1  161398  361 121 34  3941    1
rextract output:

 =-= /home/miniconda3/envs/recentrifuge/bin/rextract =-= v1.3.3 - May 2021 =-= by Jose Manuel Martí =-=

Loading NCBI nodes... OK! 
Loading NCBI names... OK! 
Building dict of parent to children taxa... OK! 
List of taxa (and below) to be explicitly included:
        Id  Scientific Name
        621 Shigella boydii
Building taxonomy tree... OK!
Filtering taxa... OK!
  15 taxid selected in 2 different taxonomical levels:
  Number of different SPECIES: 1
  Number of different NO_RANK: 14
Loading output file /home/Desktop/seq_analysis/results_162853/centrifuge/barcode_07/barcode_07.out... OK!
  Load elapsed time: 0.00303 sec
Traceback (most recent call last):
  File "/home/miniconda3/envs/recentrifuge/bin/rextract", line 347, in <module>
    main()
  File "/home/miniconda3/envs/recentrifuge/bin/rextract", line 241, in main
    print(f'  \033[90mMatching reads: \033[0m{len(records):_d} \033[90m\t'
ZeroDivisionError: division by zero

Expected outcome

Versions

khyox commented 3 years ago

Hi @jagos01,

Thanks for the comprehensive bug report. I think that the problem arises from using --verbose in your call to centrifuge. It's adding debug lines in the output before the expected content of the file. So you can run again centrifuge without that flag or, quicker, just remove all the lines in the output file before the table header: readID seqID taxID score 2ndBestScore hitLength queryLength numMatches ...and try rextract again.

Please let me know if any of these solutions solve your problem and feel free to reopen the issue if not. I will update the instructions on the wiki to recommend avoiding --verbose when launching centrifuge.

Thanks!

jagos01 commented 3 years ago

Hello @khyox, rextract works fine when the debug lines are removed. Thank you for your help. Scott