Open alexzrren opened 3 months ago
We are aware if this issue and are developing a fix.
You can work around this issue in the nucleotide search/clustering by disabling spaced k-mers with --spaced-kmer-mode 0
.
This parameter does not work for this issue. I rerun the clustering on the same dataset, and the result remains the same.
We ran the following command:
mmseqs easy-cluster orig_seqs.fasta 80ANI_cluster_nospace tmp --spaced-kmer-mode 0 --min-seq-id 0.8 --cov-mode 1 -c 0.8
And it looks fine. Could you please post the whole log of the new run?
Expected Behavior
dataset.zip I have a group of sequences which is properly aligned with almost full length and >95% identity using BLASTN, so this group of sequence have to be clustered into one cluster when using
--min-seq-id 0.8
parameterCurrent Behavior
My self assembled sequences (startswith SRS-* & known to be on the reverse strand) not properly clustered with public sequences.
Steps to Reproduce (for bugs)
1. Clustering using
easy-linclust
oreasy-cluster
(Not correctly clustered)2. Checked the result
Below shows the clustering result, according to the description in my
Current Behavior
, the sequence on the forward and reverse strand not clustered into one cluster, although it known to be very close with high identity.3. Manually convert the sequence into reverse complement
Using
seqkit seq
function to convert my self-assembled sequence into its reverse complement sequence and keep those public sequence remain original.4. Try to cluster the manually processed sequences (Show easy-cluster for instance)
Then checked the clustered TSV, these sequenced clustered into one cluster
MMseqs Output (for bugs)
Please kindly refer
Steps to Reproduce
Context
NA
Your Environment