mcmero / SVclone

A computational method for inferring the cancer cell fraction of tumour structural variation from whole-genome sequencing data.
BSD 3-Clause "New" or "Revised" License
40 stars 10 forks source link

Output file for SV VAF #37

Closed poojachandra closed 2 months ago

poojachandra commented 3 months ago

Hi,

I am looking for SV VAF and would like to confirm which file I should reference. Is it the 'ccube_sv_input.txt' file; with columns 'vaf1 and 'vaf2' ? Thank You

mcmero commented 3 months ago

The vaf1/vaf2 values are the same as the adjusted_vaf1/2 found in the filter output. These adjusted support/norm values are the ones used by ccube for clustering. The filter output also contains the mean VAF per SV and the count output contains the raw VAFs.

poojachandra commented 3 months ago

Thanks for the clarification @mcmero Also, could you please confirm the file having CCF as well ?

mcmero commented 3 months ago

The ccube_sv_input.txt file does not contain CCFs, it is the input to the clustering algorithm that infers CCF (as well as clusters, multiplicity etc.). Please see the cluster output documentation.

poojachandra commented 3 months ago

Thanks @mcmero I am looking at the link above, and the link mentions '_subclonal_structure: clusters found, the number of variants per cluster, the proportion and CCF.' but in my 'subclonal_structure.txt' file, I have only three columns: cluster, n_ssms and proportion.

Also, I am looking for an output file that has the no. of SVs per cluster along with the CCF per cluster. Where can I find this information ?

mcmero commented 3 months ago

You are right, the subclonal_structure.txt file only contains the proportion and not the CCF (documentation corrected in cb4659f). This file contains the cluster level information you are looking for (SVs per cluster). You can calculate the CCF by dividing the proportion by the tumour purity.