Closed ElizabethBorden closed 1 month ago
Hi Elizabeth,
The ssm_base_converter.py
file provides a nice framework which can be subclassed for your particular use case. If you have a VCF file and another file containing copy number calls, you could rewrite part of the vcf_to_ssm.py
script to take in that second file, and process its contents in the p_var_read_prob
function to generate the correct values for the var_read_prob
column in the ssm file. This might be the easiest thing to do.
Hello,
Sorry for the slow response, I was trying a few more things to get this working on my end but am still struggling. What software do you typically use to get allele-specific copy number calls? I cannot seem to get a format working to integrate using the ssm_base_converter.py. Would you be willing to share the set of software you used to create the VCF file that you input into ssm_base_converter.py?
Thanks!
Hi Elizabeth,
Additional VCF format fields were added to the example VCF file for compatibility with our example script, it wasn't generated by another tool. There are a lot of different tools for calling allele specific copy number, and the correct tool will depend on your data. Two popular tools are FACETS (https://github.com/mskcc/facets), and CNVkit (https://github.com/etal/cnvkit). The outputs from these tools can be used to generate the variant read probabilities. If your data consists of only diploid regions you do not believe are impacted by CNAs, you can just set var_read_prob
to 0.5 for all mutations in each sample.
Hello,
Can you suggest the best method to create a VCF file that incorporates copy number data? I see that is required to accurately make the .ssm file, but I cannot tell how this annotation was added - can you suggest a software that works well upstream of the ssm_base_converter.py script?
Thank you!
-Elizabeth