I used the unsupervised partof DeepTCR to cluster TCR sequences, but when I allowed the method to determine the optimal threshold parameter with the following command line, I got this error:
DTCRU_test.Cluster(clustering_method="hierarchical", linkage_method="ward", criterion="distance", write_to_sheets=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/DeepTCR/DeepTCR.py", line 1054, in Cluster
IDX = hierarchical_optimization(distances, features, method=linkage_method, criterion=criterion)
File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/DeepTCR/functions/utils_u.py", line 52, in hierarchical_optimization
sil.append(skmetrics.silhouette_score(features[sel, :], IDX[sel]))
File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 118, in silhouette_score
return np.mean(silhouette_samples(X, labels, metric=metric, **kwds))
File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 229, in silhouette_samples
check_number_of_labels(len(le.classes_), n_samples)
File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 35, in check_number_of_labels
% n_labels
ValueError: Number of labels is 2876. Valid values are 2 to n_samples - 1 (inclusive)
To correct this, I tried to modifiy the function _hierarchicaloptimization in the utils_u.py script in DeepTCR/functions folder (l.44):
def hierarchical_optimization(distances,features,method,criterion):
Z = linkage(squareform(distances), method=method)
t_list = np.arange(1, 100, 1) #t_list = np.arange(0, 100, 1)
sil = []
for t in t_list:
IDX = fcluster(Z, t, criterion=criterion)
if len(np.unique(IDX[IDX >= 0])) == 1:
sil.append(0.0)
continue
sel = IDX >= 0
sil.append(skmetrics.silhouette_score(features[sel, :], IDX[sel]))
IDX = fcluster(Z, t_list[np.argmax(sil)], criterion=criterion)
return IDX
Hello @sidhomj,
I used the unsupervised partof DeepTCR to cluster TCR sequences, but when I allowed the method to determine the optimal threshold parameter with the following command line, I got this error:
To correct this, I tried to modifiy the function _hierarchicaloptimization in the utils_u.py script in DeepTCR/functions folder (l.44):
and it works !