mskcc / facets

Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
141 stars 67 forks source link

Is there a way to create outputs for Pyclone #57

Open trptyrphe11 opened 7 years ago

trptyrphe11 commented 7 years ago

I used the facets and loved it a lot for speed and easy implementation. I am wondering is there a function to generate input files for pyclone (like the sequenza function sequenza2pyclone)? Thanks.

veseshan commented 7 years ago

Not having used either sequenza or PyClone I am not sure what the format is. It will be helpful if you provide a sample format file to see if it can be generated easily.

trptyrphe11 commented 7 years ago

Sorry for not being clear. In general PyClone takes a tab delimited file with a header as input.

The required fields are: mutation_id - A unique ID to identify the mutation. Good names are thing such a the genomic co-ordinates of the mutation i.e. chr22:12345. Gene names are not good IDs because one gene may have multiple mutations, in which case the ID is not unique and PyClone will fail to run or worse give unexpected results. If you want to include the gene name I suggest adding the genomic coordinates i.e. TP53_chr17:753342.

ref_counts - The number of reads covering the mutation which contain the reference (genome) allele.

var_counts - The number of reads covering the mutation which contain the variant allele.

normal_cn - The copy number of the cells in the normal population. For autosomal chromosomes this will be 2 and for sex chromosomes it could be either 1 or 2. For species besides human other values are possible.

minor_cn - The minor copy number of the cancer cells. Usually this value will be predicted from WGSS or array data.

major_cn - The major copy number of the cancer cells. Usually this value will be predicted from WGSS or array data.

Example tsv looks like: mutation_id ref_counts var_counts normal_cn minor_cn major_cn variant_case variant_freq genotype NA12156:BB:chr2:175263063 3812 14 2 0 2 NA12156 0.0036591740721380033 BB

veseshan commented 7 years ago

FACETS doesn't call mutations. So you need to merge a file with mutations (called using your favorite mutation caller) with the copy number calls from FACETS to generate this file. Extracting the copy number information for a given position should be easy from the segmentation table of the output.

trptyrphe11 commented 7 years ago

To make sure I extract the right information, do you mean merge procSample output's jointseg dataframe with fit$cncf data frame, with columns lcn.em represents the minor cn, (tcn.em - lcn.em) represents the major cn? Thanks.

veseshan commented 7 years ago

If you are using the current version of FACETS, you would only need fit$cncf. In that dataframe the columns "start" and "end" give the genomic position where the segment starts and ends.

trptyrphe11 commented 7 years ago

I see. Is it start 0-based end 1-based as bed format or all 1-based as vcf format? Thanks.

veseshan commented 7 years ago

1-based

trptyrphe11 commented 6 years ago

One more question when I examine the output more closely: I saw several segments has lcn.em of NA (~15%). When I am integrating the result with my variant file and prepare for pyClone to estimate mutation clonality, shall I replace those minor copy number with 0 or shall I filter out those mutations? Thanks.

veseshan commented 6 years ago

Filtering may be a better idea. These are typically focal changes. So If tcn is large you can see if including them with lcn=1 will give you sensible answers.

kobejamescurry commented 5 years ago

so

. So If tcn is large you can see if including them with lcn=1 will give you sensible answers.

so how large is large, thanks a lot