Closed deweihu96 closed 1 year ago
This is indeed strange behaviour. A result like this would imply that the sequences you are aiming to cluster are too distant from each other in terms of Hamming distance. If ClusTCR cannot detect sufficient pairs of sequences where HD = 1, the resulting network will be very small. This problem is especially apparent when working with small data sets of long sequences.
You can try to use the MCL method only, this may slightly improve your clustering results. In addition, we will consider more flexible solutions in future releases, where the allowed edit-distance is larger for longer sequences. I'll also gladly take a look at the problem in a little more detail if you could provide me with (a sample of) the data you are using.
Closing this issue due to inactivity.
Dear author,
I have 10000 distinct CDR3 sequences with the same length 15. I just run the codes with them like below:
However, the output file shows there are only 5 clusters. In each cluster, the difference between sequences is only one amino acid.