uqrmaie1 / admixtools

https://uqrmaie1.github.io/admixtools
71 stars 14 forks source link

running many replicates of find_graphs? #72

Open laufran opened 2 months ago

laufran commented 2 months ago

Hi there,

In the older version of find_graphs, find_graphs_old, I see there's an argument numrep that allows for multiple runs of the same parameter arguments through one function call. Just to confirm, there's no such option for the current version of find_graphs / qpgraph? If so, how would you all recommend running many replicates? And how many replicates would you recommend, given the findings in Maier et al. 2023 that "models fitting the data as well as or better than the true one are common, and their topological diversity is in most cases so high that it precludes consensus inference of topology by analysis of multiple topologies"?

Best, Lauren

uqrmaie1 commented 2 months ago

Yes, that is correct, there is no equivalent of numrep in find_graphs(). It seems a bit clearer to me to explicitly call the function multiple times than to use an argument for that, and it's not difficult to do. Here are two ways to do that:

numrep = 3
reslist = list()
for(i in seq_len(numrep)) {
  reslist[[i]] = find_graphs(f2_blocks, ...)
}

The code above uses only standard R syntax and functions, and the results in reslist will be in a list of data frames. Alternatively, you could do it like this:

numrep = 3
res = map(seq_len(numrep), ~find_graphs(f2_blocks, ...)) %>% bind_rows(.id='rep')

This will give you a single data frame where the replicate number is indicated by the column rep. You could then get the graph with the lowest score in each replicate like this:

res %>% slice_min(score, by = rep)

If you model a complex graph or let it run for many generations, each replicate could take a while to run. In that case it might be better to parallelize across replicates, for example by submitting one job per replicate on a compute cluster, or using the furrr or doParallel R packages.

And how many replicates would you recommend?

It depends on a few factors. Initially I would start with a small handful of replicates that you can inspect manually to get a feel of what the results look like. Later on, you might want to increase the number of replicates, depending on what the initial results look like:

Hope this helps!