svalkiers / clusTCR

CDR3 clustering module providing a new method for fast and accurate clustering of large data sets of CDR3 amino acid sequences, and offering functionalities for downstream analysis of clustering results.
Other
48 stars 9 forks source link

pgen calculation is hard-coded to use human TRB model #52

Closed mrbarbitoff closed 5 months ago

mrbarbitoff commented 5 months ago

Hi!

We recently noticed a discrepancy between the pgen scores calculated by OLGA and those produced by clusTCR for the alpha-chain of human TCR. While investigating this issue, I noticed that the calculation of pgen in clusTCR is hard-coded to use human beta-chain model:

Method _calc_pgen() in features.py, lines 128-131:

params_file_name = path.join(DIR,'modules/olga/default_models/human_T_beta/model_params.txt')
marginals_file_name = path.join(DIR,'modules/olga/default_models/human_T_beta/model_marginals.txt')
V_anchor_pos_file = path.join(DIR,'modules/olga/default_models/human_T_beta/V_gene_CDR3_anchors.csv')
J_anchor_pos_file = path.join(DIR,'modules/olga/default_models/human_T_beta/J_gene_CDR3_anchors.csv')

Is there any motivation for this limitation? And, if not, could you please provide additional arguments for the compute_features method to allow the user to set model for calculation of pgen?

svalkiers commented 5 months ago

Hi, thank you for bringing attention to this issue. There was no particular reason for hardcoding the OLGA models other than the fact that we primarily designed ClusTCR for our own TRB data sets.

I am happy to inform your that we have solved the issue, and users will now be able to specify the choice of the TCR chain (Clustering(chain='A')). By default, this will be set to Clustering(chain='B').

Once I have updated the documentation accordingly, I will push these new changes to conda. In the meantime, you can use a local install if you directly want to make use of this new functionality.

Cheers, Sebastiaan

svalkiers commented 5 months ago

Update -- I have pushed a new 1.0.3 build (clustcr-1.0.3+2.g0f57fe4) which now contains the functionality described in the previous comment. Let me know if you experience any issues with this new build.

Closing this for now.

mrbarbitoff commented 5 months ago

Hi @svalkiers

Thank you for these updates!