khyox / recentrifuge

Recentrifuge: robust comparative analysis and contamination removal for metagenomics
http://www.recentrifuge.org
Other
86 stars 7 forks source link

rextract: read retrieval failed #27

Closed webbchen closed 3 years ago

webbchen commented 3 years ago

Bug report

rextract fails with "this object should be subclassed" error when trying to extract reads of oomycetes

Running Recentrifuge

Command line

scratch/software/miniconda3/bin/rextract \
 -f /tmp/annew/centrifuge_Pf_Sb3.results \
 -i 4762 \
 -n /scratch/public_data/nt-centrifuge_12jan2021 \
 -1 /tmp/annew/al-conc-mate.1.fastq \
 -2 /tmp/annew/al-conc-mate.2.fastq

Data

-1 and -2 are fastq files as output by centrifuge v 1.0.4, these are all those reads which could be classified -f is the centrifuge output produced by the centrifuge run (not the summary report)

# stdout:
=-= /scratch/software/miniconda3/bin/rextract =-= v1.3.1 - Jan 2021 =-= by Jose Manuel Martí =-=

ESC[90mLoading NCBI nodes...ESC[0mESC[92m OK! ESC[0m
ESC[90mLoading NCBI names...ESC[0mESC[92m OK! ESC[0m
ESC[90mBuilding dict of parent to children taxa...ESC[0mESC[92m OK! ESC[0m
List of taxa (and below) to be explicitly included:
                Id      Scientific Name
                4762    Oomycota
ESC[90mBuilding taxonomy tree...ESC[0mESC[92m OK!ESC[0m
ESC[90mFiltering taxa...ESC[0mESC[92m OK!ESC[0m
  3383ESC[90m taxid selected in ESC[0m13ESC[90m different taxonomical levels:ESC[0m
  Number of different PHYLUM: 1
  Number of different ORDER: 11
  Number of different FAMILY: 19
  Number of different GENUS: 81
  Number of different SPECIES_GROUP: 1
  Number of different SPECIES: 3124
  Number of different SUBSPECIES: 3
  Number of different FORMA_SPECIALIS: 7
  Number of different VARIETY: 18
  Number of different FORMA: 6
  Number of different STRAIN: 40
  Number of different ISOLATE: 3
  Number of different NO_RANK: 69
ESC[90mLoading output file /tmp/annew/centrifuge_Pf_Sb3.results...ESC[0mESC[92m OK!ESC[0m
ESC[90m  Load elapsed time: ESC[0m868ESC[90m secESC[0m
  ESC[90mMatching reads: ESC[0m11_344_941 ESC[90m       (ESC[0m10.2115%ESC[90m of sample)
ESC[90mLoading FASTQ files /tmp/annew/al-conc-mate.1.fastq and /tmp/annew/al-conc-mate.2.fastq...
Mseqs: ESC[0m0.........1.........2.........3.........4.........5.........6.........7.........8.........9.........10.........11.........12.........13.........14.........15.........16.........17.........18.........19.........20.........21.........22.........23.........24.........25.........26.........27.........28.........29.........30.........31.........32.........33.........34.........35.........36.........37.........38.........39.........40.........41.........42.........43.........44.........45.........46.........47.........48.........49.........50.........51.........52.........53.........54.........55.........56.........57.........58.........59.........60.........61.........62.........63.........64.........65.........66.........67.........68.........69.........70.........71.........72.........73.........74.........75.........76.ESC[96m 76.2 MseqsESC[0m ESC[92mOK! ESC[0m

# Stderr:
Traceback (most recent call last):
  File "/scratch/software/miniconda3/bin/rextract", line 347, in <module>
    main()
  File "/scratch/software/miniconda3/bin/rextract", line 333, in main
    SeqIO.write(seqs1, filename1, 'quickfastq')
  File "/scratch/software/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/__init__.py", line 561, in write
    count = writer_class(fp).write_file(sequences)
  File "/scratch/software/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 139, in write_file
    raise NotImplementedError("This object should be subclassed")
NotImplementedError: This object should be subclassed

Expected outcome

I expected two fastq files with paired reads identified as taxon 4762 or members of that order (all oomycetes).

Versions

khyox commented 3 years ago

Hi @webbchen, thanks for the complete bug report! Could you please check which version of biopython are you using and tell me that? In the same environment that you use to run Recentrifuge you can get that by launching python and then:

>>> import Bio
>>> Bio.__version__
'1.78'

Thanks.

khyox commented 3 years ago

Cannot reproduce this issue within my test environment. Because of the error message and the current files in the biopython GH repo, I suspect a problem with the biopython version installed in your system in the miniconda environment.

webbchen commented 3 years ago

Good morning.

it's version 1.7.6 .

webbchen commented 3 years ago

Good evening I updated biopython and that seemed to have resolved the issue. I've got my read files! Many thanks for the suggestion!

Anne

khyox commented 3 years ago

I am glad to read that, Anne! Thanks for the feedback. I though the miniconda installation should have complied with Recentrifuge's requirements regarding dependencies version numbers, but it seems it didn't. Good to know for the future.

khyox commented 3 years ago

BTW, happy to know that you are using Recentrifuge in research about oomycetes. Here in North California, the sudden oak death disease is devastating. Phytophthora is transforming the natural landscape very quickly and it is very sad.

webbchen commented 3 years ago

Another aside from me while we're at it: Would it be possible to extract unclassified reads? Reg. Phytophthora: Same here in ol' Blighty. It strips the Welsh hills of their larch forests.

khyox commented 3 years ago

Another aside from me while we're at it: Would it be possible to extract unclassified reads?

That possibility would definitely be a nice addition to rextract!