stschiff / msmc

Implementation of the multiple sequential markovian coalescent
GNU General Public License v3.0
87 stars 20 forks source link

powell exceeding maximum iterations #46

Closed dieunelderilus closed 1 year ago

dieunelderilus commented 5 years ago

Dear msmc2 developers

I have 22 chromosomes as input for msmc2. The same msmc2 command run in 19 chromosomes with no errors. However I am getting this following error for 3 other chromosomes : [12/20] Baumwelch iteration

Please could you help me ?

stschiff commented 5 years ago

What do you mean “run in 19 chromosomes with no errors”? Are you running MSMC on each chromosome separately? That’s not the right way. You need to feed all chromosomes in one single run.

On 12. May 2019, at 04:10, dieunelderilus notifications@github.com wrote:

Dear msmc2 developers

I have 22 chromosomes as input for msmc2. The same msmc2 command run in 19 chromosomes with no errors. However I am getting this following error for 3 other chromosomes : [12/20] Baumwelch iteration

[1/1] Expectation Step, log likelihood: -656.508 [200/200(max)] Maximization Stepobject.Exception@powell.d mailto:Stepobject.Exception@powell.d(119): powell exceeding maximum iterations. ??:? double[] powell.Powell!(maximization_step.MinFunc).Powell.minimize(double[], double[][]) [0x567fb2] ??:? double[] powell.Powell!(maximization_step.MinFunc).Powell.minimize(double[]) [0x567bd0] ??:? model.psmc_model.PSMCmodel maximization_step.getMaximization(double[][], double[][2], model.psmc_model.PSMCmodel, const(ulong[]), bool) [0x569537] ??:? void msmc2.run() [0x56ca91] ??:? _Dmain [0x56abeb]

Please could you help me ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc/issues/46, or mute the thread https://github.com/notifications/unsubscribe-auth/AADNBGQYP2OYZCDHSU2Y6OTPU54C7ANCNFSM4HMJLZOQ.

dieunelderilus commented 5 years ago

Thank you very much for your response, you are right I was running it in each chromosome separately

stella-huynh commented 1 year ago

Hi msmc2 developers,

This issue is closed, but I ran into the exact same error message even though I feed all my sequences in one single run.

My reference genome is actually at the scaffold level, with about ~3000 scaffolds. My reads were first mapped onto my reference, and then I called my SNPs with samtools mpileup + bcftools call for each scaffold separately. I then used bamCaller.py and generate_multihetsep.py for each scaffold, and ran msmc2 as follows: msmc2 -p "4*1+30*2+4*1+6*1+10*1" -t 30 -o msmc2_result/ *.multihetsep.txt

I initially ran msmc2 with the default time segment, and it was all fine (no error message). As I need to compare msmc2 result with my PSMC analysis, for which I used the atomic intervals "4+302+4+6+10", I tried to run msmc2 with these intervals as time segment. However it was not accepted by msmc2 (I got "illegal time segment pattern" errore message). I figured out that I needed to tell for each segment whether their coalescence rates were joined or not, so I added 1 to the corresponding segments (as in the command line above) and mscm2 then started running fine. But after reaching 200 maximization iterations, I get the "powell exceeding maximum iterations" error.

Is it something due to the time segments used or to the high number of scaffolds ? Is it possible to increase the number of maximization iterations ?

Thank you for your help !

Stella

stschiff commented 1 year ago

I don't think you're using the syntax of the time patterning correctly. The pattern you propose above suggests a fit of 54 parameters (independently fitted coalescence rates) on 85 time segments. That is a lot of parameters, and an unnecessarily dense time patterning. The default patterning in PSMC would be this: -p 1*4+25*2+1*4+1*6, which means 28 free parameters on 64 time segments.

The default setting in MSMC2 has almost the same as in PSMC, but with only half the number of time segments: (12+251+12+13).

So of course, with 54 parameters as in your input, you will simply encounter massive overfitting.

stella-huynh commented 1 year ago

Hi Stephan,

Thank you very much for your fast answer ! Sorry for not coming back sooner. It was indeed most likely an overfitting issue. I tiried other time segments patterning with decreased the number of segments and free parameters and all msmc2 ran fine until the end. I still get a weird msmc2 graph output though, but it's probably something to do with my dataset (I will open another issue if I don't manage to solve it). For the time patterning, it's all good now. Thanks again !

Stella

stschiff commented 1 year ago

Good to hear. Yeah, sometimes the results are hard to interpret.