Open Ch-rode opened 1 year ago
Hi, thank you for using ClusTCR! The centroids are computed during the first step of the algorithm (i.e. the K-means), which uses a vectorized representation of each sequence to group them in Euclidean space. The centroids are initiated randomly in this space and their location is optimized throughout the various iterations of the algorithm. As such, they shouldn't be viewed as sequences, rather as vectors in the n-dimensional space (where n is equal to the number of features of each sequence).
Hello ! Thanks for this amazing library. Is there a way to retrieve only the centroids for each cluster? Are they maybe the first sequence in each cluster (i.e. row 0 from cluster 0)? Thanks a lot