svalkiers / clusTCR

CDR3 clustering module providing a new method for fast and accurate clustering of large data sets of CDR3 amino acid sequences, and offering functionalities for downstream analysis of clustering results.
Other
48 stars 9 forks source link

Error message using .summary() #31

Closed dstcr closed 2 years ago

dstcr commented 2 years ago

only happens when I use my own data but works with the given dataset. It produces the initial output csv but doesn't show summary or features

output.summary() Traceback (most recent call last): File "", line 1, in File "/home//miniconda3/lib/python3.9/site-packages/clustcr/clustering/clustering.py", line 32, in summary motifs = FeatureGenerator(self.clusters_df).clustermotif(cutoff=motifcutoff) File "/home/____/miniconda3/lib/python3.9/site-packages/clustcr/analysis/features.py", line 184, in clustermotif profile = profile_matrix(sequences) File "/home/__/miniconda3/lib/python3.9/site-packages/clustcr/analysis/tools.py", line 47, in profilematrix profile[i][pos] = np.round(psc.loc[i] / len(sequences),2) KeyError: ''

features = output.compute_features(computepgen=True) /home/____/miniconda3/lib/python3.9/site-packages/numpy/lib/function_base.py:380: RuntimeWarning: Mean of empty slice. avg = a.mean(axis) /home//miniconda3/lib/python3.9/site-packages/numpy/core/_methods.py:188: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Traceback (most recent call last): File "", line 1, in File "/home//miniconda3/lib/python3.9/site-packages/clustcr/clustering/clustering.py", line 49, in compute_features return FeatureGenerator(self.clusters_df).get_features(compute_pgen=computepgen) File "/home/____/miniconda3/lib/python3.9/site-packages/clustcr/analysis/features.py", line 167, in get_features pchem = self._calc_physchem() File "/home//miniconda3/lib/python3.9/site-packages/clustcr/analysis/features.py", line 106, in _calc_physchem properties[prop].append(np.average([physchemproperties[prop][aa] for aa in seq])) File "/home//miniconda3/lib/python3.9/site-packages/clustcr/analysis/features.py", line 106, in properties[prop].append(np.average([physchemproperties[prop][aa] for aa in seq])) KeyError: ''

Appreciate any help!

MaxVanHoucke commented 2 years ago

Hi there!

Thanks for letting us know about this error. There might be some non amino-acid characters in your sequences such as an apostrophe causing the error. Filtering out sequences that contain such characters before clustering should fix it. If the issue persists, feel free to reach out again!

All the best Max