uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
6 stars 1 forks source link

Add --timeout-second and retry to callVariant #844

Closed zhuchcn closed 9 months ago

zhuchcn commented 9 months ago

Description

I have a transcript with 375 SNV/Indels and can not finish in hours. The limiting step is calling mislceaved peptides from the PCG, after cleavage. Setting a global max_variants_per_node doesn't make a lot sense to me, so I implemented a timeout function. So for each transcript, if it can't be finished in certain time, it will stop and retry with a lower max_variants_per_node (and additional_variants_per_misc).

The --timeout-seconds is added to callVariant and defaults to 30 minutes.

The --max-variants-per-node and --additional-variants-per-misc can accept multiple values now, and they will be used as the "retry strategy". And if we run out of the --max-variants-per-node values, it will continue retry with the previous value minus 1 until 0 and raise an error. For example, with --max-variants-per-node 7 5, the retries will be 7 -> 5 -> 4 -> 3 -> 2 -> 1 -> error (which should never happen).

--additional-variants-per-misc is slightly different. If we run out of values, 0 will be used. So by default it will be 2 -> 0

Closes #...

Checklist

lydiayliu commented 9 months ago

Interesting, can you input something like --max-variants-per-node 7,7,7,5,5,5,3,3,3,2,2,2,1,1,1 and --additional-variants-per-misc 2,1,0,2,1,0,2,1,0,2,1,0,2,1,0 to get a grid search effect?

zhuchcn commented 9 months ago

You can, but this retry mechanism is only awakened when a transcript is timed out. Setting too many retry cycles will probably make the total run time very long. For example, if it ends up being finishable with 5-0, you will have to run 5 times before it. If we set --timeout-seconds to 900, it will take 90 minutes to finish this one (which isn't super bad).

Btw, I made these two arguments to accept multiple values, so you don't need to comma. Like this:

--max-variants-per-node 7 7 7 5 5 5 --additional-variants-per-misc 2 1 0 2 1 0