kgori / sigfit

Flexible Bayesian inference of mutational signatures
GNU General Public License v3.0
33 stars 8 forks source link

Parallelization of fit_signatures #60

Closed jasonptm closed 2 years ago

jasonptm commented 2 years ago

Greetings,

We would like to run a reasonable number of samples through fit_signatures to get exposures for the Cosmic signatures. I am trying to determine the best way to parallelize this. Our cluster's prioritization system makes it much easier to submit many small jobs versus one large job. Therefore, my question is:

Is there a practical difference between fitting signatures to a set of samples at once (by passing them all to a single call of fit_signatures) versus fitting the signatures separately (by passing them individually to fit_signatures in separate runs)? The latter appears to be vastly faster, but I wanted to make sure we weren't going to produce incorrect results via this strategy.

Please let me know if there is any further information or clarification I can provide, and thanks in advance for any help or advice you might be able to provide.

-Jason Turner-Maier

baezortega commented 2 years ago

Hi Jason,

Independently fitting a set of signatures to many samples should give the same results as fitting them collectively. For large numbers of samples it may be more efficient to run them in parallel, as you suggest. The results should be correct.

Best, Adrian