Closed chenjh356 closed 4 months ago
Hi,
That's odd one. I think this might be caused by the naming convention in the read names? Maybe you could change the /
to a space in all read names, and trying again.
Another possibility is that your readset indeed does contain non-unique sequence names? If so, grepping for'CL100090007L2C013R069_386072 will turn up multiple hits.
when I use "seqkit rmdup -n -i " to deal with my data, it works well now. Thanks
seqkit rmdup -n -i to handle non-unique names problem
SingleM version 0.18.0 database S4.3.0.GTDB_r220.metapackage_20240523.smpkg.zb error: Traceback (most recent call last): File "/singlem/bin/singlem", line 709, in
singlem.pipe.SearchPipe().run(
File "/singlem/bin/../singlem/pipe.py", line 69, in run
otu_table_object = self.run_to_otu_table(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/singlem/bin/../singlem/pipe.py", line 373, in run_to_otu_table
self._num_threads, self._working_directory).run_diamond(
^^^^^^^^^^^^
File "/singlem/bin/../singlem/diamond_spkg_searcher.py", line 34, in run_diamond
fwds = self._prefilter(dmnd, forward_read_files, False, performance_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/singlem/bin/../singlem/diamond_spkg_searcher.py", line 103, in _prefilter
raise Exception("Multiple DIAMOND best hits detected for '{}'. This likely indicates that the input reads have non-unique names, possibly due to the same read appearing twice in a single input file".format(qseqid))
Exception: Multiple DIAMOND best hits detected for 'CL100090007L2C013R069_386072/1'. This likely indicates that the input reads have non-unique names, possibly due to the same read appearing twice in a single input file
How to handle?