nygenome / Conpair

Concordance and contamination estimator for tumor–normal pairs
Other
56 stars 29 forks source link

Title: Missing Conditional Likelihood Functions for Certain Genotype Combinations #25

Closed chaopower-zhang closed 4 weeks ago

chaopower-zhang commented 4 weeks ago

Description:

In the create_conditional_likelihood_of_base_dict function, it seems that only a subset of all possible genotype combinations is being defined with conditional likelihood functions. Given that we have two bases (A and B), theoretically, there should be 3×3×2=18 conditional likelihood functions to account for all possible genotype combinations.

However, combinations such as AAAB_A, AAAB_B, ABBB_A, and ABBB_B are missing in the current implementation.

For instance, in the function we have:

image

But, functions for genotypes like AAAB and ABBB are not defined, although these combinations could be relevant under certain conditions.

Questions:

Is there a specific reason why some genotype combinations were excluded? Would adding these missing combinations (such as AAAB_A, AAAB_B, ABBB_A, and ABBB_B) improve the robustness of the likelihood calculations, especially in cases where these genotypes might occur? Proposed Solution: If there are no specific constraints, I suggest defining conditional likelihood functions for all 18 combinations to ensure comprehensive genotype likelihood coverage.

Thank you for your time and looking forward to your insights on this!

chaopower-zhang commented 4 weeks ago

image Oh, I see. Does this mean that ABAA_B and ABBB_A are equivalent.