qmarcou / IGoR

IGoR is a C++ software designed to infer V(D)J recombination related processes from sequencing data. Find full documentation at:
https://qmarcou.github.io/IGoR/
GNU General Public License v3.0
47 stars 25 forks source link

Model edge gene choice relations differ #50

Closed penuts7644 closed 4 years ago

penuts7644 commented 4 years ago

Hi Quentin,

I was recently looking into some model comparisons and noticed that for the default IGoR TCRB human model has edges between the V-gene choice with the D-gene of J-gene choices as well as the J-gene choice with the D-gene choice:

%GeneChoice_V_gene_Undefined_side_prio7_size89;GeneChoice_D_gene_Undefined_side_prio6_size3
%GeneChoice_V_gene_Undefined_side_prio7_size89;GeneChoice_J_gene_Undefined_side_prio7_size15
%GeneChoice_J_gene_Undefined_side_prio7_size15;GeneChoice_D_gene_Undefined_side_prio6_size3

However, when I compare this to the TCRB human model that OLGA supplies by default or the ones I'm constructing locally. These only have the edge with the J-gene choice against the D-gene choice:

%GeneChoice_J_gene_Undefined_side_prio7_size15;GeneChoice_D_gene_Undefined_side_prio6_size3

Do you have any idea why this is and how it is possible to make a model with the additional gene choice edges?

Cheers, Wout

qmarcou commented 4 years ago

Hi Wout, This combination of three edges in the graph allows to learn the joint probability P(V,D,J). Because IGoR treats all alleles as different genes, this joint probability is useful to capture the fact that some alleles of the same gene might not be able to recombine together as they lie on different chromosomes. In a nutshell this is needed to fine tune a model to an individual of interest.

The version used by OLGA learns the factorized version P(V)P(D,J) of the gene usage probability. It will miss all information from the chromose organisation/partition of the alleles. Learning at least the P(D,J) joint probability is essential to learn the biological impossibility to recombine J2 with D2. As I am not responsible for OLGA support/developpment I cannot tell why they made the choiceof having only this conditionnal dependence, and this migth be due to some algorithmic issue...