morrislab / pairtree

Pairtree is a method for reconstructing cancer evolutionary history in individual patients, and analyzing intratumor genetic heterogeneity. Pairtree focuses on scaling to many more cancer samples and cancer cell subpopulations than other algorithms, and on producing concise and informative interactive characterizations of posterior uncertainty.
MIT License
37 stars 11 forks source link

remove_high_vaf.py #19

Closed brucemoran closed 3 years ago

brucemoran commented 3 years ago

Hi,

this script has been removed, I was using it (still have in a container) but wondering what replaces it's functionality?

Can you give command line replacement for:

python remove_high_vaf.py \
   ${params.runID}.pairtree.ssm \
   ${params.runID}.out_params.json \
   ${params.runID}.rmvaf_params.json

NB for legacy code would be cool to keep it and flag it's deprecation!

Thanks,

Bruce.

jwintersinger commented 3 years ago

Hi Bruce,

We've moved the script to util/fix_bar_var_read_prob.py. It now takes an additional argument where it will (potentially) write a new .ssm file.

usage: fix_bad_var_read_prob.py [-h] [--logbf-threshold LOGBF_THRESHOLD] [--verbose] [--ignore-existing-garbage]
                                [--action {add_to_garbage,modify_var_read_prob}] [--var-read-prob-alt VAR_READ_PROB_ALT]
                                in_ssm_fn in_params_fn out_ssm_fn out_params_fn

Find variants with likely incorrect var_read_prob by comparing model with provided var_read_prob to haploid (LOH) model using Bayes factors

positional arguments:
  in_ssm_fn             Input SSM file with mutations
  in_params_fn          Input params file listing sample names and any existing garbage mutations
  out_ssm_fn            Output SSM file with modified list of garbage mutations
  out_params_fn         Output params file with modified list of garbage mutations

optional arguments:
  -h, --help            show this help message and exit
  --logbf-threshold LOGBF_THRESHOLD
                        Logarithm of Bayes factor threshold at which the haploid model is accepted as more likely model than the model using
                        the provided var_read_prob (default: 10.0)
  --verbose             Print debugging messages (default: False)
  --ignore-existing-garbage
                        Ignore any existing garbage variants listed in in_params_fn and test all variants. If not specified, any existing
                        garbage variants will be kept as garbage and not tested again. (default: False)
  --action {add_to_garbage,modify_var_read_prob}
  --var-read-prob-alt VAR_READ_PROB_ALT

The option --action will default to add_to_garbage, which gives the same behaviour you had before. In this case, the .ssm file written should be unmodified relative to the input, and the "bad" SSMs will be specified as garbage in the new out_params_fn. If you pass --action modify_var_read_prob, a new .ssm file will be written where the problematic variants have their var_read_prob set to --var-read-prob-alt (default 1) in every sample, and the new .params.json file should be left unmodified.

Please let me know if you have any other questions!

brucemoran commented 3 years ago

Cool thank you, only question: which --action will you be using/recommending?

I'd lean towards add_to_garbage.

Thanks,

Bruce.