svalkiers / clusTCR

CDR3 clustering module providing a new method for fast and accurate clustering of large data sets of CDR3 amino acid sequences, and offering functionalities for downstream analysis of clustering results.
Other
48 stars 9 forks source link

No objects to concatenate in clustering.fit #54

Open mrbarbitoff opened 5 months ago

mrbarbitoff commented 5 months ago

Hi!

After updating to the latest release of clusTCR, I am facing an issue while attempting to fit the clustering to data (please see the complete traceback below). The same functions worked perfectly with the previous version. I initialize the clustering object like clustering = Clustering(n_cpus=24, chain='A') (though the same error occurs if I don't specify the chain, both for TRA and TRB input data). I'd be grateful for your help with this issue.

ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 output = clustering.fit(tra_data, include_vgene = True, 
      2                         cdr3_col="aaSeqCDR3", 
      3                         v_gene_col="vGene")

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/clustcr/clustering/tools.py:96, in timeit.<locals>.timed(*args, **kwargs)
     94 def timed(*args, **kwargs):
     95     start = time.time()
---> 96     result = myfunc(*args, **kwargs)
     97     end = time.time()
     98     print(f'Total time to run ClusTCR: {(end-start):.3f}s')

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/clustcr/clustering/clustering.py:429, in Clustering.fit(self, data, include_vgene, cdr3_col, v_gene_col, alpha)
    425 """
    426 Function that calls the indicated clustering method and returns clusters in a ClusteringResult
    427 """
    428 if include_vgene:
--> 429     return self._vgene_clustering(data, cdr3_col, v_gene_col)
    430 else:
    431     try:

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/clustcr/clustering/clustering.py:346, in Clustering._vgene_clustering(self, data, cdr3_col, v_gene_col)
    343 super_clusters = self._faiss(subset["junction_aa"])
    344 # Second clustering step
    345 clusters = ClusteringResult(
--> 346     MCL_multiprocessing_from_preclusters(
    347         super_clusters, self.mcl_params, self.n_cpus
    348         ), chain=self.chain
    349                             ).clusters_df
    350 clusters.cluster += c # adjust cluster identifiers to ensure they stay unique
    351 subset = subset.merge(clusters, left_on="junction_aa", right_on="junction_aa")

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/clustcr/clustering/methods.py:139, in MCL_multiprocessing_from_preclusters(preclust, mcl_hyper, n_cpus)
    137     if c != 0:
    138         nodelist[c]['cluster'] += nodelist[c - 1]['cluster'].max() + 1
--> 139 return pd.concat(nodelist, ignore_index=True)

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/pandas/core/reshape/concat.py:382, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    379 elif copy and using_copy_on_write():
    380     copy = False
--> 382 op = _Concatenator(
    383     objs,
    384     axis=axis,
    385     ignore_index=ignore_index,
    386     join=join,
    387     keys=keys,
    388     levels=levels,
    389     names=names,
    390     verify_integrity=verify_integrity,
    391     copy=copy,
    392     sort=sort,
    393 )
    395 return op.get_result()

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/pandas/core/reshape/concat.py:445, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    442 self.verify_integrity = verify_integrity
    443 self.copy = copy
--> 445 objs, keys = self._clean_keys_and_objs(objs, keys)
    447 # figure out what our result ndim is going to be
    448 ndims = self._get_ndims(objs)

File ~/anaconda3/envs/clustcr_103/lib/python3.10/site-packages/pandas/core/reshape/concat.py:507, in _Concatenator._clean_keys_and_objs(self, objs, keys)
    504     objs_list = list(objs)
    506 if len(objs_list) == 0:
--> 507     raise ValueError("No objects to concatenate")
    509 if keys is None:
    510     objs_list = list(com.not_none(*objs_list))

ValueError: No objects to concatenate

Yury

svalkiers commented 5 months ago

Hi Yury,

Sorry for the inconvenience. I believe this error indicates that your clustering result is empty (i.e. no clusters were detected), hence there is nothing to be concatenated. I will update the script to return a None-type instead.

Another solution would be to loosen up the stringency by only looking at the CDR3 amino acid sequence.

Best, Sebastian

svalkiers commented 5 months ago

The issue should be fixed in the latest build (clustcr-1.0.3+3.g5fa6b46). Let me know if you encounter any further problems.

Cheers, Sebastiaan

mrbarbitoff commented 5 months ago

Hi @svalkiers

Thank you for your reply! It seems that the lack of clustering results was due to the fact that I occasionally installed the GPU version instead of the regular one during an update. Sorry for that. The issue is resolved.