The same estimate time for each bootstrap

thyme0425 commented 1 year ago

Hi, I'm using a single pulse admixture model to estimate the admixture time from one species to the admixed species. I got only one sample per species. When I was running Ancestry_HMM using the following command: ancestry_hmm -i input_file -s ahmm.ploidy -a 2 0.95 0.05 -p 0 3450000 0.95 -p 1 -100 0.05 -g -e 1e-4 -b 100 1000

The estimate time was 10000 for every bootstrap as is shown below:

optimum:
        type    time    proportion
        0       3.45e+06        0.95
        1       10000   0.05
bootstrap: 0
        type    time    proportion
        0       3.45e+06        0.95
        1       10000   0.05
bootstrap: 1
        type    time    proportion
        0       3.45e+06        0.95
        1       10000   0.05
bootstrap: 2
        type    time    proportion
        0       3.45e+06        0.95
        1       10000   0.05
...

So I was wondering if Ancestry_HMM has a upper limit of 10000 generations. I tried to set the --tmax parameter to 100000/500000, but I got the following result with the same estimate time for each bootstrap:

optimum:
        type    time    proportion
        0       3.45e+06        0.95
        1       404509  0.05
bootstrap: 0
        type    time    proportion
        0       3.45e+06        0.95
        1       404509  0.05
bootstrap: 1
        type    time    proportion
        0       3.45e+06        0.95
        1       404509  0.05
bootstrap: 2
        type    time    proportion
        0       3.45e+06        0.95
        1       404509  0.05
bootstrap: 3
        type    time    proportion
        0       3.45e+06        0.95
        1       404509  0.05
...

The output on the screen looks like:

reading command line                            0 ms
reading sample ids and ploidy                   0 ms
creating states matrix                          0 ms
reading data and creating emissions matrices    120000 ms
computing transition routes                     0 ms
creating initial admixture model(s)             0 ms
estimating 1 parameters
starting golden section search
        iteration       low bound       low test        high test       high bound      lnl low lnl high
        0       1       190984  309017  500000  -49640.4        -49640.4
        1       190984  309017  381966  500000  -49640.4        -49640.4

                                        SEARCH TIME: 50000 ms

optimal model found:

        type    time    proportion
        0       3.45e+06        0.95
        1       404509  0.05
computing 100 bootstrap models
starting golden section search
        bootstrap no:   0

        iteration       low bound       low test        high test       high bound      lnl low lnl high
        0       1       190984  309017  500000  -50176.2        -50176.2
        1       190984  309017  381966  500000  -50176.2        -50176.2
starting golden section search
        bootstrap no:   1

        iteration       low bound       low test        high test       high bound      lnl low lnl high
        0       1       190984  309017  500000  -50106.4        -50106.4
        1       190984  309017  381966  500000  -50106.4        -50106.4
starting golden section search
        bootstrap no:   2

        iteration       low bound       low test        high test       high bound      lnl low lnl high
        0       1       190984  309017  500000  -50196.3        -50196.3
        1       190984  309017  381966  500000  -50196.3        -50196.3
starting golden section search
        bootstrap no:   3

        iteration       low bound       low test        high test       high bound      lnl low lnl high
        0       1       190984  309017  500000  -50112.5        -50112.5
        1       190984  309017  381966  500000  -50112.5        -50112.5
starting golden section search
        bootstrap no:   4
...

I can not figure out why I got this result, and I'm wondering if Ancestry_HMM is applicable to my dataset. Many thanks!

russcd commented 1 year ago

Hi There,

It looks like the likelihood for each test point is the same. So the optimization terminates at the default initial value.

Could you send me your input files so I can try to figure out why that's happening?

Thanks,

Russ

thyme0425 commented 1 year ago

Here attached are my input file and ploidy file. Thanks! input_file.txt ahmm.ploidy.txt

russcd commented 1 year ago

Good news: I can reproduce your results. Thanks for providing the files!

Bad news: I don't think ancestry_hmm is the ideal program for your analysis. Your reference panels are very small (2 chromosomes in each). This combined with an ancient admixture time means it is likely to perform poorly. The reason the program arrives at the same outcome in each bootstrap is that the likelihood surface is very flat, and the two test points are sufficiently close that optimization terminates.

thyme0425 commented 1 year ago

Thank you very much for your reply! I will try to increase my sample size to see if ancestry_hmm can perform better. For now I wonder if you could recommend some softwares if I want to estimate introgression time based on very small sample size and/or ancient admixture time.

russcd commented 1 year ago

Ancient admixture with few samples is a hard problem

Treemix might be helpful?

If you have an outgroup species or two, maybe you could use d-statistics or d-foil for part of your analysis?

russcd / Ancestry_HMM

The same estimate time for each bootstrap #16