Open uqmzhou8 opened 1 year ago
Hi Charles,
Sorry to hear you're running into issues. I don't have a good solution to your problem, but here are some thoughts in the hopes that they're helpful:
--store_stationary
and --load_stationary
flags allow you to split these two components up. In practice --store_stationary
will run both bits but then save the stationary distribution, while --load_stationary
will take a stored stationary distribution and use that instead of computing it. Depending on the breakdown of how much time is getting spent in your case on the two steps, you may or may not be able to get some meaningful savings out of this. To avoid performing the second part of the likelihoods (i.e., integrating the stationary distribution forward) when using --store_stationary
you would need to input a constant size demography, where the size matches the most ancient size in the demography that you want to compute. You would then need to throw away the lookup table produced by that run, but you could keep the stored stationary. This might help, but the best case scenario is that computing the stationary and the integrating i forward are currently roughly equal in terms of runtime, in which case you could cut your whole job into two jobs of roughly half the time. But even still, I suspect that this will result in going from ~12 days to ~6 days per job, which is probably still too long for a shared cluster.pyrho
is quite fast, so you could do a bunch of random subsets of your 450 individuals, compute recombination maps for each and then average the results. None of the subsampling or averaging is implemented in pyrho
so it would require a bit of scripting on your end.pyrho
. I don't have the bandwidth to do this, but I think it is doable in principle, and I would be happy to accept a PR.I hope this helps!
Jeff
Hi,
Apologies as I am relatively new to this population genetics work. I am trying to compute a lookup table for n=450 and N=600 but it is taking too long to run on a high performance computing cluster, exceeding the time limit even when using multiple threads. Therefore, I am trying to figure out how to construct multiple tables and join them into a complete lookup table or if there is a method to pause, save and continue at some predefined checkpoints?
Based on pyrho make_table --help, there are two arguments that seem to help me in my case as shown below: -S STORE_STATIONARY, --store_stationary STORE_STATIONARY Name of file to save stationary distributions -- useful for computing many lookup tables sequentially. -L LOAD_STATIONARY, --load_stationary LOAD_STATIONARY Name of file to load stationary distributions -- useful for computing many lookup tables tables sequentially.
Are these the correct arguments to use for storing information at different checkpoints and would you have any examples on how I can use this to build my way to n=450 or even more? Please suggest other methods that I can use to build a lookup table of n=450 or more but spread into smaller files/jobs.
Thank you in advance!
Regards, Charles