CDR3 clustering module providing a new method for fast and accurate clustering of large data sets of CDR3 amino acid sequences, and offering functionalities for downstream analysis of clustering results.
Other
48
stars
9
forks
source link
Floats in slice indices during pgen computation #55
I was testing the pgen calculation after the recent updates allowing the selection of chains, and bumped into a bug related to python2 integer division within OLGA. It seems that, while in some functions you have added div3int() to avoid the errors, some of the functions within OLGA (seemingly, the ones that are called for alpha-chain data) still contain "unprotected" python2 integer divisions, for example, in the compute_Pi_V_insVJ_given_J function, lines 1832, 1850, 1867 (I noticed these ones, but I'm not sure there aren't any more).
The issue leads to the following stack trace when attempting to compute features for the alpha chain (never mind occasional lines I inserted while debugging):
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[18], line 1
----> 1 features = output.compute_features(compute_pgen=True)
2 features.head()
File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/clustering/clustering.py:43, in ClusteringResult.compute_features(self, compute_pgen)
42 def compute_features(self, compute_pgen=True):
---> 43 return FeatureGenerator(self.clusters_df).get_features(
44 chain=self.chain,
45 compute_pgen=compute_pgen
46 )
File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/analysis/features.py:153, in FeatureGenerator.get_features(self, chain, compute_pgen)
151 pchem = self._calc_physchem()
152 if compute_pgen:
--> 153 pgen = self._calc_pgen(chain=chain)
154 return self._combine(aavar, pchem, pgen)
155 else:
File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/analysis/features.py:125, in FeatureGenerator._calc_pgen(self, chain)
123 pgen_model = get_olga_model(chain=chain)
124 # Compute Pgen
--> 125 p = [pgen_model.compute_aa_CDR3_pgen(seq) for seq in self.nodes["junction_aa"]]
126 # Format results
127 self.nodes["pgen"] = p
File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/analysis/features.py:125, in <listcomp>(.0)
123 pgen_model = get_olga_model(chain=chain)
124 # Compute Pgen
--> 125 p = [pgen_model.compute_aa_CDR3_pgen(seq) for seq in self.nodes["junction_aa"]]
126 # Format results
127 self.nodes["pgen"] = p
File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/modules/olga/generation_probability.py:274, in GenerationProbability.compute_aa_CDR3_pgen(self, CDR3_seq, V_usage_mask_in, J_usage_mask_in, print_warnings)
272 print(CDR3_seq, file=log_file)
273 log_file.close()
--> 274 return self.compute_CDR3_pgen(CDR3_seq, V_usage_mask, J_usage_mask)
File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/modules/olga/generation_probability.py:1679, in GenerationProbabilityVJ.compute_CDR3_pgen(self, CDR3_seq, V_usage_mask, J_usage_mask)
1676 Pi_V_given_J, max_V_align = self.compute_Pi_V_given_J(CDR3_seq, V_usage_mask, r_J_usage_mask)
1678 #Include insertions (R and PinsVJ) to get the total contribution from the left (3') side conditioned on J gene. Return Pi_V_insVJ_given_J
-> 1679 Pi_V_insVJ_given_J = self.compute_Pi_V_insVJ_given_J(CDR3_seq, Pi_V_given_J, max_V_align)
1681 pgen = 0
1682 #zip Pi_V_insVJ_given_J and Pi_J together for each J gene to get total pgen
File ~/anaconda3/envs/clustcr-new/lib/python3.10/site-packages/clustcr/modules/olga/generation_probability.py:1856, in GenerationProbabilityVJ.compute_Pi_V_insVJ_given_J(self, CDR3_seq, Pi_V_given_J, max_V_align)
1853 base_ins = 1
1855 #Loop over all other insertions using base_nt_vec
-> 1856 for aa in CDR3_seq[init_pos/3 + 1: init_pos/3 + max_insertions/3]:
1857 Pi_V_insVJ_given_J[j][:, init_pos+base_ins+1] += self.PinsVJ[base_ins + 1]*np.dot(self.Svj[aa], current_base_nt_vec)
1858 Pi_V_insVJ_given_J[j][:, init_pos+base_ins+2] += self.PinsVJ[base_ins + 2]*np.dot(self.Dvj[aa], current_base_nt_vec)
TypeError: slice indices must be integers or None or have an __index__ method
Could you please fix this problem?
Thanks in advance and sorry for bothering you again :)
Hi!
I was testing the pgen calculation after the recent updates allowing the selection of chains, and bumped into a bug related to python2 integer division within OLGA. It seems that, while in some functions you have added
div3int()
to avoid the errors, some of the functions within OLGA (seemingly, the ones that are called for alpha-chain data) still contain "unprotected" python2 integer divisions, for example, in thecompute_Pi_V_insVJ_given_J
function, lines 1832, 1850, 1867 (I noticed these ones, but I'm not sure there aren't any more). The issue leads to the following stack trace when attempting to compute features for the alpha chain (never mind occasional lines I inserted while debugging):Could you please fix this problem?
Thanks in advance and sorry for bothering you again :)
Best, Yury