weizhongli / cdhit

Automatically exported from code.google.com/p/cdhit
GNU General Public License v2.0
641 stars 129 forks source link

cd-hit-dup Segmentation fault (core dumped) #139

Open scross92 opened 1 year ago

scross92 commented 1 year ago

I installed cd-hit-auxtools via a conda. However, when I run cd-hit-dup, I get an issue for Segmentation fault (core dumped). I have seen in previous issues #27 and #97 users had issues with segmentation fault errors, but this was using cd-hit-est and neither came to a clear conclusion on how to fix any possible segmentation fault errors.

I tried removing and reinstalling the cd-hit-auxtools in my conda env but that did not resolve the issue.

I feel it may be a relatively easy fix that I am overlooking. I appreciate any help!

Here is the output when I run cd-hit-dup

cd-hit-dup -i /path/to/fastq -o output.fq -u 30 -e 2 Read 100000 sequences ... From input: /path/to/fastq Total number of sequences: 125659 Longest: 76 Shortest: 75 Start clustering duplicated sequences ... primer = 0 Clustered 10000 sequences with 9987 clusters ... Clustered 20000 sequences with 19938 clusters ... Clustered 30000 sequences with 29863 clusters ... Clustered 40000 sequences with 39766 clusters ... Clustered 50000 sequences with 49655 clusters ... Clustered 60000 sequences with 59512 clusters ... Clustered 70000 sequences with 69342 clusters ... Clustered 80000 sequences with 79195 clusters ... Clustered 90000 sequences with 89032 clusters ... Clustered 100000 sequences with 98821 clusters ... Clustered 110000 sequences with 108635 clusters ... Clustered 120000 sequences with 118421 clusters ... Number of reads: 125659 Number of clusters found: 123942 Number of clusters with abundance above the cutoff (=1): 123942 Number of clusters with abundance below the cutoff (=1): 0 Writing clusters to files ... Segmentation fault (core dumped)

replikation commented 1 year ago

core dumped is a hardware issue. Too little disk space or RAM

scross92 commented 1 year ago

Normally I would agree. However, I have plenty of disk space (nearly 20TB) and RAM (250GB) available, so this shouldn't be an issue. Interestingly, when I run cd-hit-dup using a singularity container, it works completely fine on this same device. I may have to try and adjust my pipeline to using a singularity container for this pipeline as it may be an issue with the current conda release for the cd-hit-auxtools.

sapoudel commented 8 months ago

@scross92 Which container are you using? I am having the same issue when using conda build and biocontainer's image: https://quay.io/repository/biocontainers/cd-hit-auxtools?tab=tags&tag=latest

I also have large disk space + RAM available and subsampling to 25k reads still gives the error so it's not out-of-memory issue. However, it works if I only pass one direction at a time e.g. cd-hit-dup -i <(zcat fastq/fastq_1.fastq.gz) -o output.fq1