wwood / singlem

Novelty-inclusive microbial community profiling of shotgun metagenomes
http://wwood.github.io/singlem/
GNU General Public License v3.0
137 stars 18 forks source link

Multiple DIAMOND best hits detected for 'CL100090007L2C013R069_386072/1'. #188

Closed chenjh356 closed 4 months ago

chenjh356 commented 5 months ago

SingleM version 0.18.0 database S4.3.0.GTDB_r220.metapackage_20240523.smpkg.zb error: Traceback (most recent call last): File "/singlem/bin/singlem", line 709, in singlem.pipe.SearchPipe().run( File "/singlem/bin/../singlem/pipe.py", line 69, in run otu_table_object = self.run_to_otu_table(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/singlem/bin/../singlem/pipe.py", line 373, in run_to_otu_table self._num_threads, self._working_directory).run_diamond( ^^^^^^^^^^^^ File "/singlem/bin/../singlem/diamond_spkg_searcher.py", line 34, in run_diamond fwds = self._prefilter(dmnd, forward_read_files, False, performance_parameters) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/singlem/bin/../singlem/diamond_spkg_searcher.py", line 103, in _prefilter raise Exception("Multiple DIAMOND best hits detected for '{}'. This likely indicates that the input reads have non-unique names, possibly due to the same read appearing twice in a single input file".format(qseqid)) Exception: Multiple DIAMOND best hits detected for 'CL100090007L2C013R069_386072/1'. This likely indicates that the input reads have non-unique names, possibly due to the same read appearing twice in a single input file

How to handle?

wwood commented 5 months ago

Hi,

That's odd one. I think this might be caused by the naming convention in the read names? Maybe you could change the / to a space in all read names, and trying again.

Another possibility is that your readset indeed does contain non-unique sequence names? If so, grepping for'CL100090007L2C013R069_386072 will turn up multiple hits.

chenjh356 commented 4 months ago

when I use "seqkit rmdup -n -i " to deal with my data, it works well now. Thanks

chenjh356 commented 4 months ago

seqkit rmdup -n -i to handle non-unique names problem