stschiff / msmc2

GNU General Public License v3.0
53 stars 9 forks source link

fixing mutation rate with -m flag #2

Closed stsmall closed 6 years ago

stsmall commented 9 years ago

Hi Stephan, I am attempting to use MSCM2 to estimate the cross-coalescent rate. I ran MSMC2 three independent times with a fixed mutation rate "-m 0.000678"; once for pop1, once for pop2, once for pop1-pop2. I then ran the CombineCrossCoal.py. Some of the resulting numbers were greater than 1.

I checked the results file and the left_boundary as well as the right_boundary were not the same time values among runs.

Am I misunderstanding how to run the cross-coalescent with MSMC2 or am I using the "-m" flag incorrectly? thanks!

stschiff commented 9 years ago

I wouldn't use the -m flag, at least I wouldn't see why. The time segments are indeed determined by -m, but also by the number of haplotype pairs analysed. The CombineCrossCoal-script takes care of the different segmentations by interpolation, so that's all taken care of.

It can happen with msmc2 that the relative cross coalescence rate can exceed one, the optimisation is not constrained at all, so if the coalescence rate across population exceeds the coalescence rate within populations, you get a value above 1. This can perhaps be caused by insufficient phasing or other technical artifacts, I can't right now think of a population genetic reason. How bad is it? Does it go much above 1, and for multiple segments?

stsmall commented 9 years ago

Hi Stephan, I decided to use the -m flag after reading the help documentation on msmc2. "For a cross-population analysis, you need three independent msmc runs, and you should use this option to ensure the same time boundaries in each run"

Phasing is not an issue since my organisms is haploid-diploid. Diploid, recombination, for sexual reproduction. We have collected the haploid stage for genome sequences. Our pseudo-diploids return the same history when using either PSMC and MSMC2. I don't know how this will affect CrossCoal estimation...

If I understand correctly, then for 4 haps: msmc2 -o out1.file Pop1.msmc2.in msmc2 -o out2.file Pop2.msmc2.in msmc2 -P 0,0,1,1 -o out1-2.file *Pop1-2.msmc2.in python combineCrossCoal.py out1-2.file out1.file out2.file

The results from combineCrossCoal.py scaled as: 2_lambda01/(lambda00+lambda11) *_it is likely that the 1st and last estimates are stochastic since I have not run bootstraps on the data.

2.248119011 0.51849322 0.066830905 0.059488314 0.254932836 0.545325356 0.64256677 0.593430625 0.617207772 0.727567454 0.821851149 0.88896758 1.204403793 0.976395543 0.984059642 0.915196162 0.615046544 0.41396457 1.810651528 0.06715499

Thank you for a great popgen program!

stschiff commented 9 years ago

Hi, I see, I should change that text, that isn’t anymore the case.

Yes, your command lines look correct. Note that you can also prepare your input file for both individuals, and then select the first two haps with “-I 0,1”, and the second haps with “-I 2,3”, and then the fourth run as you did with “-P 0,0,1,1”. The -I flag just selects haplotypes from your input files.

The results don’t look great, admittedly. Looks like the both boundaries are random. I would probably join the first and last segments. So for example, with 32 time segments: -p=1_4+23_1+1*5

Stephan

On 01 Jul 2015, at 14:59, stsmall notifications@github.com wrote:

Hi Stephan, I decided to use the -m flag after reading the help documentation on msmc2. "For a cross-population analysis, you need three independent msmc runs, and you should use this option to ensure the same time boundaries in each run"

Phasing is not an issue since my organisms is haploid-diploid. Diploid, recombination, for sexual reproduction. We have collected the haploid stage for genome sequences. Our pseudo-diploids return the same history when using either PSMC and MSMC2. I don't know how this will affect CrossCoal estimation...

If I understand correctly, then for 4 haps: msmc2 -o out1.file Pop1.msmc2.in msmc2 -o out2.file Pop2.msmc2.in msmc2 -P 0,0,1,1 -o out1-2.file *Pop1-2.msmc2.in python combineCrossCoal.py out1-2.file out1.file out2.file

The results from combineCrossCoal.py scaled as: 2lambda01/(lambda00+lambda11) *it is likely that the 1st and last estimates are stochastic since I have not run bootstraps on the data.

2.248119011 0.51849322 0.066830905 0.059488314 0.254932836 0.545325356 0.64256677 0.593430625 0.617207772 0.727567454 0.821851149 0.88896758 1.204403793 0.976395543 0.984059642 0.915196162 0.615046544 0.41396457 1.810651528 0.06715499

Thank you for a great popgen program!

— Reply to this email directly or view it on GitHub https://github.com/stschiff/msmc2/issues/2#issuecomment-117684478.

darshanredij commented 7 years ago

Hi,

How to use the option "-t " in msmc2.0.0. I am currently using t=32 but this overloads the CPU. I am currently running it on a 64 CPU machine with 252 GB of RAM. Is there any way to optimize ?

Thank you...!!!

Regards Darshan

stschiff commented 7 years ago

Just use fewer threads. The more threads the more memory! Also note that it doesn’t make much sense to use more threads than chromosomes. For humans, I found that 11 threads is good, which is half the number of chromosomes, so it can run the first 11 chromosomes in parallel, then the second 11 chromosomes in each EM-cycle.

Stephan

On 27 Feb 2017, at 09:44, darshanredij notifications@github.com wrote:

Hi,

How to use the option /apps/msmc/msmc2.0.0 -t 32, it overloads the CPU. I am currently running it on a 64 CPU machine with 252 GB of RAM. How can I optimize my run ?

Regards Darshan

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc2/issues/2#issuecomment-282659925, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmqrzgVBeAOfnrBcoeCTrrIyp2sTIks5rgozugaJpZM4FPVoN.