qmarcou / IGoR

IGoR is a C++ software designed to infer V(D)J recombination related processes from sequencing data. Find full documentation at:
https://qmarcou.github.io/IGoR/
GNU General Public License v3.0
47 stars 25 forks source link

about "iteration"of -evaluate #46

Closed decenwang closed 3 years ago

decenwang commented 5 years ago

Hi Quentin,

when I tried to use my small dataset for a test of IGoR, I found the -evaluate cannot set up the "N" of iteration(as shown in webpage), not like -infer --N_iter 5(only for inference). If so, the -evaluate only iterates once according to my results. the likelihood in "-evaluate" result is not the best(~ -17, mean-log-likelihood). but if it iterates 5 times in "-infer", the likelihood will reach a plateau(~ -13, better than -17). What should I do to set the iterations number in "-evaluate"? or use -set_custom_model to the final_parms.txt and final_marginals.txt for -evaluate ? In your web documentation -set_custom_model corresponds to:

Use a custom model as a baseline for inference or evaluation. Note that this will override custom genomic templates for inference and evaluation. Alternatively, providing only the model parameters file will lead IGoR to create model marginals initialized to a uniform distribution

, which says it is a custom model set by customers. Thanks a lot!

Best,

Decen

qmarcou commented 5 years ago

Dear @decenwang , As mentionned in this section -infer and -evaluate trigger the exact same actions (at least for now) since using -evaluate IGoR performs the Expectation step of the Expectation-Maximization algorithm (note: the Expectation step is computationally intensive in our setting while the Maximization one is close to instantaneous), while using -infer -N_iter N IGoR will perform N EM steps. In a nutshell -evaluate is the same as -infer -N_iter 1. The two different commands exist only for the sake of clarity for users using pre-defined recombination models and to allow for future improvements in the model inference strategy without retro-compatibility issues for command line users.

To answer your final question, yes you should use evaluate using -evaluate -set_custom_model /your/working/dir/inference/final_parms.txt /your/working/dir/inference/final_marginals.txt As a shortcut you could directly use 6 iterations of EM and would get the same result as explained above, though this is not guaranteed to work in future releases of IGoR