Methods to improve the running time when -m is large

ZheCai commented 2 years ago

Hi,

Thank you for your work on a such great tool.

I'm running orientagraph on my data (npop:17 nsnp:1351621), but the running time becomes very long when the number of migration edges is greater than five.

Do you know any methods that could improve the running time? (e.g., multi-threads or any others)

Here is my command: orientagraph -i input_af01_an90_LD_500_100_2.tmix.imp.gz -m 5 -o input _af01_an90_LD_500_100_2.tmix.imp.group.k100.m5 -root Outgroup -k 100 -global -mlno -allmigs

ekmolloy commented 2 years ago

Hello,

Thank you for your message! We currently do not have any implementations that speed up running time for large numbers of gene flow edges, although this may be available in the future. To save on compute time, you can do two things.

First, you can use -mlno 2 which will only perform the mlno search after the addition of the first 2 gene flow edges. This will save compute time, although you will search less of network space.

Second, you can run OrientAGraph using the -freq2stat flag to estimate the summary statistics and exit. The resulting files can be given as input to OrientAGraph using the -givenmat flag, so that you can estimate a network using various parameter settings without re-estimating the summary statistics each time (this preprocessing phase can be time consuming).

-Erin

ZheCai commented 2 years ago

Hi Erin,

Thank you very much for your suggestion! Now I encounter another trouble, I will open a new issue for it.

sriramlab / OrientAGraph

Methods to improve the running time when -m is large #6