khyox / recentrifuge

Recentrifuge: robust comparative analysis and contamination removal for metagenomics
http://www.recentrifuge.org
Other
86 stars 7 forks source link

No sequence passed the filter error #13

Closed mhyleung closed 3 years ago

mhyleung commented 5 years ago

Dear all

I ran rcf based on kraken2 outputs and encountered the following error:

Loading NCBI nodes... OK!
Loading NCBI names... OK!
Building dict of parent to children taxa... OK!

Please, wait, processing files in parallel...

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/mhyleung/workspace/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/mhyleung/workspace/anaconda3/envs/py36/lib/python3.6/site-packages/recentrifuge/taxclass.py", line 86, in process_output
    log, stat, counts, scores = read_method(target_file, scoring, minscore)
  File "/home/mhyleung/workspace/anaconda3/envs/py36/lib/python3.6/site-packages/recentrifuge/kraken.py", line 135, in read_kraken_output
    raise Exception(red('\nERROR! ') + 'No sequence passed the filter!')
Exception:
ERROR! No sequence passed the filter!
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mhyleung/workspace/anaconda3/envs/py36/bin/rcf", line 812, in <module>
    main()
  File "/home/mhyleung/workspace/anaconda3/envs/py36/bin/rcf", line 771, in main
    read_samples()
  File "/home/mhyleung/workspace/anaconda3/envs/py36/bin/rcf", line 450, in read_samples
    input_files, [r.get() for r in async_results]):
  File "/home/mhyleung/workspace/anaconda3/envs/py36/bin/rcf", line 450, in <listcomp>
    input_files, [r.get() for r in async_results]):
  File "/home/mhyleung/workspace/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 670, in get
    raise self._value
Exception:
ERROR! No sequence passed the filter!

My command is

rcf -k control1.krk -k control2.krk -k control3.krk -k control4.krk -k sample1.krk -k sample2.krk -k /sample3.krk -k sample4.krk -c 4 -o rcf_output.html -s KRAKEN -y 25

Thank you so much

Regards

Marcus

mhyleung commented 5 years ago

Dear all

It appears the issue was that my kraken input actually had the species/taxa name instead of the taxids (I added the --use-names option when I ran kraken2). It appeared therefore that the reads could not be identified by RCF as there was no taxid match. Moderators might want to close this thread. Thanks again!

khyox commented 5 years ago

Dear @mhyleung, thank you very much for the feedback when you solved the issue. It will be useful to other Kraken users. Thanks!

YiweiNiu commented 3 years ago

Hello,

Sorry to interrupt a closed issue.

I encountered the same error, but I did not use --use-names when running Kraken2.

Here is my command to run Kraken2

kraken2 --threads 8 --db $kraken2_db --quick --classified-out k2.classified --unclassified-out k2.unclassified --output k2.krk --report k2.report --gzip-compressed minimap2.bad.fa.gz

and command to run rcf

rcf -n /home/wangj/RefData/taxdump -k k2.krk > rcf.log

The error:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/wangj/anaconda3/envs/kraken2/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/wangj/anaconda3/envs/kraken2/lib/python3.6/site-packages/recentrifuge/taxclass.py", line 86, in process_output
    log, stat, counts, scores = read_method(target_file, scoring, minscore)
  File "/home/wangj/anaconda3/envs/kraken2/lib/python3.6/site-packages/recentrifuge/kraken.py", line 154, in read_kraken_output
    raise Exception(red('\nERROR! ') + 'No sequence passed the filter!')
Exception: 
ERROR! No sequence passed the filter!
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/wangj/anaconda3/envs/kraken2/bin/rcf", line 836, in <module>
    main()
  File "/home/wangj/anaconda3/envs/kraken2/bin/rcf", line 795, in main
    read_samples()
  File "/home/wangj/anaconda3/envs/kraken2/bin/rcf", line 469, in read_samples
    input_files, [r.get() for r in async_results]):
  File "/home/wangj/anaconda3/envs/kraken2/bin/rcf", line 469, in <listcomp>
    input_files, [r.get() for r in async_results]):
  File "/home/wangj/anaconda3/envs/kraken2/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
Exception: 
ERROR! No sequence passed the filter!

Here is the output of Kraken2: k2.zip and the complete log of rcf rcf.log.gz

Thank you for your time and effort.

Bests, Yiwei

khyox commented 3 years ago

@YiweiNiu, thanks for reporting this problem!

The issue appears because you are using the --quick flag in Kraken2 command. This flag ultimately causes the Kraken output to lack any data about the score of the taxonomic assignment. Since one of Recentrifuge's pillars is based on assignment scores for reliable and robust further processing and visualization, the code is not finding any acceptable sequence and gives that error. I will now update the wiki documentation to include this flag too since it will be useful for the Kraken community using Recentrifuge. Many thanks!

YiweiNiu commented 3 years ago

Thank you for your prompt reply. You are right. I reran rcf after removing the --quick in Kraken2 command, and got the right output.

khyox commented 3 years ago

Thanks for the feedback, @YiweiNiu. I am glad it worked for you. Happy recentrifuging!