khyox / recentrifuge

Recentrifuge: robust comparative analysis and contamination removal for metagenomics
http://www.recentrifuge.org
Other
86 stars 7 forks source link

ZeroDivisionError/Centrifuge #22

Closed ganiatgithub closed 4 years ago

ganiatgithub commented 4 years ago

Bug report

Hi @khyox,

Thanks for the nice format, this is related to post #18

Bug summary

ZeroDivisionError: division by zero

How I got here

Command line (running centrifuge)

> centrifuge -q -x /home/Staff/uqgni1/tools/centrifuge/hvc -U Run02_filtered.fastq -p 16 --report-file centrifuge-hvc.txt

Command line (running rextract)

> rextract -f centrifuge-hvc.txt -i 694009 -q Run02_filtered.fastq -n ~/miniconda2/envs/recentrifuge/bin/taxdump/

centrifuge output (centrifuge-hvc.txt)

head centrifuge-hvc.txt
name    taxID   taxRank genomeSize  numReads    numUniqueReads  abundance
Homo sapiens    9606    species 3272089205  11164   8884    0.0
Human alphaherpesvirus 2    10310   species 154675  1   1   0.0
Cercopithecine alphaherpesvirus 2   10317   species 150715  1   0   0.0
Bovine alphaherpesvirus 1   10320   species 135301  8   0   0.0
Suid alphaherpesvirus 1 10345   species 143461  1   1   0.0
Murid betaherpesvirus 1 10366   species 230278  1   1   0.0
Tupaiid betaherpesvirus 1   10397   species 195859  1   1   0.0
Ovine gammaherpesvirus 2    10398   species 135135  1   1   0.0
Human adenovirus 2  10515   leaf    35937   11  0   0.0

Error message outcome (Slurm system)

Loading NCBI nodes... OK!
Loading NCBI names... OK!
Building dict of parent to children taxa... OK!
List of taxa (and below) to be explicitly included:
        Id  Scientific Name
        694009  Severe acute respiratory syndrome-related coronavirus
Building taxonomy tree... OK!
Filtering taxa... OK!
  261 taxid selected in 2 different taxonomical levels:
  Number of different SPECIES: 1
  Number of different NO_RANK: 260
Loading output file centrifuge-hvc.txt... OK!
  Load elapsed time: 0.0314 sec
Traceback (most recent call last):
  File "/home/Staff/uqgni1/miniconda2/envs/recentrifuge/bin/rextract", line 347, in <module>
     main()
   File "/home/Staff/uqgni1/miniconda2/envs/recentrifuge/bin/rextract", line 241, in main
    print(f'  \033[90mMatching reads: \033[0m{len(records):_d} \033[90m\t'
 ZeroDivisionError: division by zero
khyox commented 4 years ago

Hi @ganiatgithub,

Thanks for the issue report.

Now, I clearly see the problem: you are using a Centrifuge summary file instead of a Centrifuge output. The summary file is lacking essential data that the Centrifuge output file is providing, such as the classification score. Due to the Recentrifuge's goal of allowing a robust downstream analysis, for each sample, Recentrifuge requires the complete data of the Centrifuge output.

In your case, if you did not save the output when you run Centrifuge before, you can get it with:

> centrifuge -q -x /home/Staff/uqgni1/tools/centrifuge/hvc -U Run02_filtered.fastq -p 16 > centrifuge-hvc.out

After you get this file, you can use rcf to get an interactive krona-like scored visualization of your sample and rextract to get the reads as you tried before.

Let me know if this makes sense to you. Thanks.

ganiatgithub commented 4 years ago

Thank, it works!

khyox commented 4 years ago

Thanks for the feedback!