CDR3 clustering module providing a new method for fast and accurate clustering of large data sets of CDR3 amino acid sequences, and offering functionalities for downstream analysis of clustering results.
Other
50
stars
9
forks
source link
Examples in 'Clustering' raises different errors #50
It seems that clustering with any method other than faiss makes the clustering halt with the error 0-dimensional array given. Array must be at least two-dimensional when using Python 3.11.4 and the latest conda version of clustcr.
#!/usr/bin/env python3
from clustcr import Clustering, datasets
# This works
clustering = Clustering(method='faiss')
cdr3 = datasets.test_cdr3()
output = clustering.fit(cdr3['junction_aa'])
output = clustering.fit(cdr3, include_vgene=True, cdr3_col="junction_aa", v_gene_col="v_call")
data = datasets.vdjdb_paired()
cdr3, alpha = data['CDR3_beta'], data['CDR3_alpha']
output = clustering.fit(cdr3, alpha=alpha)
# This fails with 'Wrong input. Please provide an iterable object containing CDR3 amino acid sequences.'
clustering = Clustering()
cdr3 = datasets.test_cdr3()
output = clustering.fit(cdr3)
# MCL and two-step methods both fail with '0-dimensional array given. Array must be at least two-dimensional'
mcl_clustering = Clustering(method='mcl')
output = mcl_clustering.fit(cdr3)
ts_clustering = Clustering(method='two-step')
output = ts_clustering.fit(cdr3)
It works under Python 3.10.12 and the latest conda version of clustcr (the clustering completes with all methods), though SciPy complains: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.1.
I used the following commands to create the conda environments:
Thanks for raising this issue. It seems like the introduction of the V gene clustering functionality has broken some of the original examples. I hope to fix this in the next release.
It seems that clustering with any method other than
faiss
makes the clustering halt with the error0-dimensional array given. Array must be at least two-dimensional
when using Python 3.11.4 and the latest conda version ofclustcr
.Also, the example in Clustering/Usage:
fails with the error
Wrong input. Please provide an iterable object containing CDR3 amino acid sequences.
. This is irrespective of the python version.It seems, that
fit()
ignores thecdr3_col
argument ifinclude_vgene=False
, as this works:but this fails:
Here's a complete example
It works under Python 3.10.12 and the latest conda version of
clustcr
(the clustering completes with all methods), though SciPy complains:UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.1
.I used the following commands to create the
conda
environments: