stevemussmann / BayesAss3-SNPs

Modification of BayesAss 3.0.4 to allow handling of large SNP datasets
GNU General Public License v3.0
15 stars 7 forks source link

Exactly same trace results on multiple independent runs and low ESS #18

Closed githubgig closed 2 weeks ago

githubgig commented 2 months ago

Hello,

I'm using BayesAss3-SNPs with about 17k loci and 4 populations. I'm having a couple of issues.

  1. First issue is that the 5 trace images and statistics from 5 independent runs are exactly the same (see attached image). The 5 independent runs were run in parallel but with each set up in a different folder (see command below). So I thought the same input file name may be the issue and tried re-running after changing the input file name for each run, but the traces are looking exactly the again. I have not experienced this before and it seems very odd that every run is giving exactly same statistics with no variation between them. All traces.pdf

_BA3-SNPS -F ./BayesAss_Input.immanc -l 17129 -i 50000000 -n 5000 -b 10000000 -o BA3outRun1.txt -m 0.3 -a 0.9 -f 0.1 -u -g -t -v 2>&1 | tee Run1.log

  1. The second issue is that the LogProb ESS value for each run is only 18. How do I deal with this?

Thank you.

stevemussmann commented 2 months ago

Hello,

  1. It looks like from your command you did not specify unique random number seeds (-s option) for each of your runs. Because of this, each run started from the same random number seed which created the exact same result 5 times.
  2. It appears you have tuned the -m, -a, and -f parameters. The only other solution for low ESS is sometimes to run longer until you achieve an acceptable ESS.

-Steve

githubgig commented 2 months ago

Thank you Steve.

I will rerun with -s option.

Regarding my ESS question, yes, I have tuned the -m, -a, and -f parameters. I'm already at 50M iterations which is taking a while, any suggestions on how much more I should increase? Alternately will increasing sampling help increase ESS? My first run was -n 5000, should I try lowering this to -n 1000? Or, is it acceptable to combine results from 5 independent runs to increase ESS?

Thanks again.

stevemussmann commented 2 months ago

Hello,

Sorry for the delayed response - github apparently no longer notifies me of responses to issues after they are initially opened.

You could try to increase sampling interval and see if this helps. I doubt it will make much of a difference, but it's worth a shot.

Yes, it's acceptable to combine results from multiple independent runs as long as it converges upon sampling a similar parameter space. If two runs happen find wildly different results then you wouldn't want to combine them - but from what you have described so far of your dataset I doubt that will happen based upon my experience with this program.

-Steve

githubgig commented 2 months ago

Hi Steve.

Ok, thanks for your reply. I'll try these suggestions and see what works best.