svalkiers / clusTCR

CDR3 clustering module providing a new method for fast and accurate clustering of large data sets of CDR3 amino acid sequences, and offering functionalities for downstream analysis of clustering results.
Other
48 stars 9 forks source link

Floats in slice indices during pgen computation #55

Open mrbarbitoff opened 5 months ago

mrbarbitoff commented 5 months ago

Hi!

I was testing the pgen calculation after the recent updates allowing the selection of chains, and bumped into a bug related to python2 integer division within OLGA. It seems that, while in some functions you have added div3int() to avoid the errors, some of the functions within OLGA (seemingly, the ones that are called for alpha-chain data) still contain "unprotected" python2 integer divisions, for example, in the compute_Pi_V_insVJ_given_J function, lines 1832, 1850, 1867 (I noticed these ones, but I'm not sure there aren't any more). The issue leads to the following stack trace when attempting to compute features for the alpha chain (never mind occasional lines I inserted while debugging):

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 features = output.compute_features(compute_pgen=True)
      2 features.head()

File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/clustering/clustering.py:43, in ClusteringResult.compute_features(self, compute_pgen)
     42 def compute_features(self, compute_pgen=True):
---> 43     return FeatureGenerator(self.clusters_df).get_features(
     44         chain=self.chain,
     45         compute_pgen=compute_pgen
     46         )

File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/analysis/features.py:153, in FeatureGenerator.get_features(self, chain, compute_pgen)
    151 pchem = self._calc_physchem()
    152 if compute_pgen:
--> 153     pgen = self._calc_pgen(chain=chain)
    154     return self._combine(aavar, pchem, pgen)
    155 else:

File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/analysis/features.py:125, in FeatureGenerator._calc_pgen(self, chain)
    123 pgen_model = get_olga_model(chain=chain)
    124 # Compute Pgen
--> 125 p = [pgen_model.compute_aa_CDR3_pgen(seq) for seq in self.nodes["junction_aa"]]
    126 # Format results
    127 self.nodes["pgen"] = p

File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/analysis/features.py:125, in <listcomp>(.0)
    123 pgen_model = get_olga_model(chain=chain)
    124 # Compute Pgen
--> 125 p = [pgen_model.compute_aa_CDR3_pgen(seq) for seq in self.nodes["junction_aa"]]
    126 # Format results
    127 self.nodes["pgen"] = p

File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/modules/olga/generation_probability.py:274, in GenerationProbability.compute_aa_CDR3_pgen(self, CDR3_seq, V_usage_mask_in, J_usage_mask_in, print_warnings)
    272 print(CDR3_seq, file=log_file)
    273 log_file.close()
--> 274 return self.compute_CDR3_pgen(CDR3_seq, V_usage_mask, J_usage_mask)

File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/modules/olga/generation_probability.py:1679, in GenerationProbabilityVJ.compute_CDR3_pgen(self, CDR3_seq, V_usage_mask, J_usage_mask)
   1676 Pi_V_given_J, max_V_align = self.compute_Pi_V_given_J(CDR3_seq, V_usage_mask, r_J_usage_mask)
   1678 #Include insertions (R and PinsVJ) to get the total contribution from the left (3') side conditioned on J gene. Return Pi_V_insVJ_given_J
-> 1679 Pi_V_insVJ_given_J = self.compute_Pi_V_insVJ_given_J(CDR3_seq, Pi_V_given_J, max_V_align)
   1681 pgen = 0
   1682 #zip Pi_V_insVJ_given_J and Pi_J together for each J gene to get total pgen

File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/modules/olga/generation_probability.py:1856, in GenerationProbabilityVJ.compute_Pi_V_insVJ_given_J(self, CDR3_seq, Pi_V_given_J, max_V_align)
   1853 base_ins = 1
   1855 #Loop over all other insertions using base_nt_vec
-> 1856 for aa in CDR3_seq[init_pos/3 + 1: init_pos/3 + max_insertions/3]:
   1857     Pi_V_insVJ_given_J[j][:, init_pos+base_ins+1] += self.PinsVJ[base_ins + 1]*np.dot(self.Svj[aa], current_base_nt_vec)
   1858     Pi_V_insVJ_given_J[j][:, init_pos+base_ins+2] += self.PinsVJ[base_ins + 2]*np.dot(self.Dvj[aa], current_base_nt_vec)

TypeError: slice indices must be integers or None or have an __index__ method

Could you please fix this problem?

Thanks in advance and sorry for bothering you again :)

Best, Yury