statgenetics / cstatgen

C++ statgen library for SEQPower, SEQLinkage and RV-NPL
0 stars 1 forks source link

The limitation of the Lander-Green approach to multipoint linkage (<30 "bits") #3

Open changebio opened 2 years ago

changebio commented 2 years ago

I tried to test cstatgen by running some big pedigree. and I got the following errors. Screen Shot 2022-01-31 at 12 28 06 PM

The errors are from merlin/MerlinFamily.h. So I searched merlin related material. I found one pdf, which mentioned "Uses the Lander-Green approach to multipoint linkage, so not suitable for large pedigrees (>30 “bits”)" (https://genepi.qimr.edu.au/staff/davidD/Course/Slides/merlin.pdf).

gaow commented 2 years ago

@changebio indeed Lander Green is good for multiple markers but not huge pedigrees .. This family has 8 founders and 20 descendants? Yes it is a large family indeed. But this is different issue from #2 right?

changebio commented 2 years ago

I closed the #2 issue. and ask your last comment(bit 24 is hard coded. I wonder what bit you should set it to ... perhaps large enough to make it work?) in here. maybe 30 Bits is a good option. I tried to set the maxBits to 36 with 64G memory. But It still failed to phase haplotypes for these big families, which probably need 100G memory. The amount of computation and time required increases exponentially.

image
gaow commented 2 years ago

I see. We are stuck with lander-green. But this is a multi-marker, multi-sample issue ... initially I pick lander-green becauase i thought we are more in trouble of a multi-marker situation (a gene has many markers). I did not think too much of the big family situation. ... @changebio you might be correct after all that we may need some additional trimming ... I wonder if I should talk to Jurg again. Or seriously study the Pseudomarker program. What do you think?