stevemussmann / BayesAss3-SNPs

Modification of BayesAss 3.0.4 to allow handling of large SNP datasets
GNU General Public License v3.0
15 stars 7 forks source link

individual migrant ancestries never changes from 0 #5

Closed wpearman1996 closed 5 years ago

wpearman1996 commented 5 years ago

Hi there, thanks for developing this - it's really helpful.

I've been trying to run this on Ubuntu on a dataset of 1000 loci using the following command: ./BayesAss3-SNPs/BA3-SNPS-Ubuntu64 -t -s 321 -i 16000000 -b 8000000 -l 1000 -o ba3 _seed321_9aug.out -F bayesassout9aug1000loci.inp -v -a0.25 -f0.25 -m0.25 I've then been trying to optimize the mixing by modifying the last 3 parameters. However my results always show the individual migrant ancestries never increase from 0 (unless I use the -p parameter).

logP(M): -703.83 logL(G): -97057.71 logL: -97761.54 % done: [0.03] % accepted: (0.16, 0.00 0.38, 0.14, 0.62) I'm not sure how to fix this, I have attached my input file if it helps.
bayesassout9aug1000loci.txt

Any suggestions on how to fix this? I have tried many different variations of the last 3 parameters in my command and nothing seems to change it.

stevemussmann commented 5 years ago

Perhaps I'm misunderstanding the issue, but I don't see anything wrong with the outputs. For example, I did a short run using the following command (after calculating optimal mixing parameters with my autotune program:

BA3-SNPS -t -s 321 -i 100000 -b 10000 -l 1000 -o ba3_seed321_9aug.out -F bayesassout9aug1000loci.txt -v -m 0.075 -a 0.2125 -f 0.075

I opened the resulting indiv file and the migrant ancestries look normal enough to me. Most of your samples are being called non-migrants. For example, individual SBMD4 with 1.000 calculated for the 2,0 category:

[0,0]:0.000 [1,0]:0.000 [2,0]:1.000 [3,0]:0.000 [4,0]:0.000 [5,0]:0.000 [6,0]:0.000 [7,0]:0.000 [8,0]:0.000
[0,1]:0.000 [1,1]:0.000 [2,1]:0.000 [3,1]:0.000 [4,1]:0.000 [5,1]:0.000 [6,1]:0.000 [7,1]:0.000 [8,1]:0.000
[0,2]:0.000 [1,2]:0.000 [2,2]:0.000 [3,2]:0.000 [4,2]:0.000 [5,2]:0.000 [6,2]:0.000 [7,2]:0.000 [8,2]:0.000

Very few are calculated with migrant ancestries for different categories, such as individual BBFC4:

[0,0]:0.000 [1,0]:0.000 [2,0]:0.000 [3,0]:0.589 [4,0]:0.000 [5,0]:0.000 [6,0]:0.000 [7,0]:0.000 [8,0]:0.000
[0,1]:0.000 [1,1]:0.000 [2,1]:0.000 [3,1]:0.000 [4,1]:0.000 [5,1]:0.000 [6,1]:0.000 [7,1]:0.000 [8,1]:0.000
[0,2]:0.033 [1,2]:0.144 [2,2]:0.189 [3,2]:0.000 [4,2]:0.044 [5,2]:0.000 [6,2]:0.000 [7,2]:0.000 [8,2]:0.000

Overall I do not see anything out of the ordinary... it just looks like very few individuals of your study species are moving among populations.

wpearman1996 commented 5 years ago

Hi, thanks for responding so quickly. Perhaps i'm confused, I was following the instructions in the BA3 manual, for the adjustment of mixing parameters. I've been trying to adjust the acceptance rate for individual migrant ancestries as the manual suggests an acceptance rate of between 0.2 and 0.6. I thought that perhaps this was why I was struggling to get the chains to converge.

For my species i wouldn't expect huge amounts of migration between populations, so perhaps that may also be why.

Any advice is greatly appreciated though!

stevemussmann commented 5 years ago

It sounds to me like the program is working appropriately for you, but I think you may have some confusion over how the mixing parameters work (apologies in advance if I have misinterpreted your statements).

I think I see now that you were referring to the acceptance rate for the "individual migrant ancestries" in what you posted, not the individual migration rates calculated in the output files. In the following line the 1st, 3rd, and 4th values after the % accepted (0.16, 0.38, and 0.14) are the only ones you have any control over using the -m, -a, and -f options.

logP(M): -703.83 logL(G): -97057.71 logL: -97761.54 % done: [0.03] % accepted: (0.16, 0.00 0.38, 0.14, 0.62)

Unfortunately, it's not possible to adjust the individual migrant ancestries acceptance rate (the second value), or the missing genotypes (fifth value) because they are both discrete parameters.

The numbers you specify with the -m, -a, and -f options do not represent the acceptance rates themselves, but rather influence the acceptance rates. Sometimes they need to be increased from the default values (0.1) while other times they need to be decreased. If you have a bunch of unique input files to run, I suggest using my autotune program (https://github.com/stevemussmann/BA3-SNPS-autotune).

wpearman1996 commented 5 years ago

Ah right okay - thanks that makes sense, definitely the result of confusion on my part. Thanks for helping me out and clearing the confusion up!