jgx65 / quantinemo

a swiss knife to simulate complex demographic and genetic scenarios
https://www2.unil.ch/popgen/softwares/quantinemo/
8 stars 0 forks source link

Finding allele to mutate to in an efficient way #9

Open frederic-michaud opened 6 years ago

frederic-michaud commented 6 years ago

The way commit 4e87c16301634de4db6394276c26ad4051a073cb handles the situation about how to check if a mutation is possible is not satisfactory. If only two alleles exist, it has anyways to switch to the other one, expect is the probability to switch to it is zero. Today, Elisa point to me some input file (too complicated to be reproduced here) where the simulation was very slow. Tracking things down, it turns out that the automatically build array for mutation frequency had the values 0.99999999... and 0.000000001... This might be seen as a bug though it was actually following a known path. Yet, when an allele was stuck in the first position, it had a hard time finding a way to go to the second allele.

It would probably make much more sense to just check if only two alleles exist, and switch to the other one if p > 0. Since this is a sensible part of the code, I will see if I have time to change it and do some intensive testing, or if I leave it like this. Results are anyways correct, it is just much much slower (a factor of about 100 in the case of Elisa).

sneuensc commented 6 years ago

Hi Fred,

First, did you get the reply about the command line help?

Concerning the mutation mentioned below. Would be good to add the check. Especially for the case of Elisa it makes a lot of sense.

Cheers, Sam

On 2 Feb 2018, at 16:42, frederic-michaud notifications@github.com wrote:

The way commit 4e87c16 https://github.com/jgx65/qnemo_dev/commit/4e87c16301634de4db6394276c26ad4051a073cb handles the situation about how to check if a mutation is possible is not satisfactory. If only two alleles exist, it has anyways to switch to the other one, expect is the probability to switch to it is zero. Today, Elisa point to me some input file (too complicated to be reproduced here) where the simulation was very slow. Tracking things down, it turns out that the automatically build array for mutation frequency had the values 0.99999999... and 0.000000001... This might be seen as a bug though it was actually following a known path. Yet, when an allele was stuck in the first position, it had a hard time finding a way to go to the second allele.

It would probably make much more sense to just check if only two alleles exist, and switch to the other one if p > 0. Since this is a sensible part of the code, I will see if I have time to change it and do some intensive testing, or if I leave it like this. Results are anyways correct, it is just much much slower (a factor of about 100 in the case of Elisa).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jgx65/qnemo_dev/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/ALGyCSQq_NnRlg8_Pmi7DN4lv7Rezmatks5tQyz3gaJpZM4R3Xao.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/jgx65/qnemo_dev","title":"jgx65/qnemo_dev","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/jgx65/qnemo_dev"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Finding allele to mutate too in an efficient way (#9)"}],"action":{"name":"View Issue","url":"https://github.com/jgx65/qnemo_dev/issues/9"}}}

frederic-michaud commented 6 years ago

Actually, I tried this on Friday, and it shows no improvement in the rapidity of the execution. I just added an if-else statement in the mutation function, making the code more cumbersome. I didn't commit it because adding complexity to the code without significative time-saving is not really the direction I want to take.

A cleaner way would probably be to write two different functions for the mutation and to adapt the pointer _mut_model_func_ptr depending on the value of _nb_allele.

In any case, since 4e87c16, mutation is not anymore a major bottleneck in Elisa's simulation, only about 30% time in spend doing it, and first step to improve it is either to do it at the genome level or at the population level, which would probably make it much faster but would need major refactoring.

frederic-michaud commented 6 years ago

After rechecking my result from Friday, I see that I was wrong. Actually, the new version is about twice faster than the old one. Yet, another bottleneck is present just before, which is to test whether mutation occurs of not. More precisely, the entire mutation step is about 30% of the time, but looking for a new allele is only about 0.4% of the time and 0.2% of the time with the new version.

If we include the new version, we would gain about 0.2% speed. I don't think it's worthwhile the cost. I let the issue open to have it in mind if I have to refactor this one day.