torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
671 stars 125 forks source link

Issue encountered when using vsearch --usearch_global to generate OTU frequency table #541

Closed timz0605 closed 10 months ago

timz0605 commented 1 year ago

The command I use is

vsearch --usearch_global ../5_quality_control/combined.fasta --threads 8 --db zotus.fasta --strand plus --id 0.97 --otutabout zotutable.txt

I am processing COI metabarcoding data for ~50 sites. I have finished all the merging, filtering, denoising, etc., steps, and the zotus.fasta file is my ZOTU database for this study. However, I encountered an issue when trying to generate the OTU frequency table

The size of combined.fasta is approximately 500 MB, and the size of the zotus.fasta is about 600 KB. However, the process was taking so long and the size of the zotutable.txt generated was somewhere around 30GB (which obviously does not make sense). I am wondering if you could take a look and see why I might encountered this error.

timz0605 commented 1 year ago

The zotus.fasta file contains approx. 2000 sequences of a 313 bases long partial COI gene region.

torognes commented 1 year ago

Hi and thanks for reporting this problem.

I agree that the size of the output file seems too big.

Are you sure that vsearch has properly parsed the OTU and sample identifiers from your input files?

It is difficult to pinpoint the what is wrong without more information. If possible, could you provide e.g. the vsearch version and computer platform, the exact sizes of the files, number of sequences, etc.

If possible, extracts with a small portion of the input files and result files (e.g. the first few lines) could be valuable. If you do not want to publish it here, you could alternatively send it to me by email.

torognes commented 10 months ago

No response - closing.