mskcc / tempo

CCS research pipeline to process WES and WGS TN pairs
https://cmotempo.netlify.app/
12 stars 5 forks source link

In sampledata.txt - non-zero TMB but blank in the `number of mutations` column #757

Closed ShwetaCh closed 4 years ago

ShwetaCh commented 4 years ago

I found two samples that have a non-zero TMB but have a "blank" in the column "Number of mutations" (when the mut_somatic.maf does have 6 mutations each). So this may not be a bug necessarily depending on what is being reported.

s_C_F624M0_P001_ds_C_F624M0_N001_d TMB - 0.03 s_C_001553_P001_ds_C_001553_N001_d TMB - 0.08

[chavans@juno ~]$ grep "s_C_F624M0_P001_d" /juno/work/ccs/chavans/tempo_megatron/Result/cohort_level/mut_somatic.maf | wc -l 6 [chavans@juno ~]$ grep "s_C_001553_P001_d" /juno/work/ccs/chavans/tempo_megatron/Result/cohort_level/mut_somatic.maf | wc -l 6

gongyixiao commented 4 years ago

Good catch and thank you for reporting this.

The reason for the error the following:

  1. Number_of_Mutations column is generated by this lines in process MetaDataParser: https://github.com/mskcc/tempo/blob/94e134accffc65dd5b8de834c59d24d40438342e/containers/metadataparser/create_metadata_file.py#L121-L126 Which means it is not actually counting number of mutations in the maf file, but using Number of Mutations column in mutational signature output *.mutsig.txt from process RunMutationSignatures

  2. When there are less than 5 mutations in the input maf for process RunMutationSignatures, the program will not generate output and give an error like below:

    [gongy@terra 757]$ cat megatron_Jan5th/work/ac/2c2c7ff67387ffc9585cb0a12e5e0d/.command.out
    Loading known signatures from /mutation-signatures/Stratton_signatures30.txt
    Making sample signatures from maf s_C_F624M0_P001_d__s_C_F624M0_N001_d.somatic.maf
    6 lines read; 4 SNPS counted, 0 SNPs skipped, 2 non-SNPs skipped
    Decomposing signatures and writing to s_C_F624M0_P001_d__s_C_F624M0_N001_d.mutsig.txt
    0/1 decomposed
    Warning: sample has less than 5 mutations; cancelling decomposition
    Sample s_C_F624M0_P001_d not decomposed

So, the solution will be not using above info but instead count number of mutations based on https://github.com/mskcc/tempo/blob/94e134accffc65dd5b8de834c59d24d40438342e/containers/metadataparser/create_metadata_file.py#L63

For now, I suggest analysts to count the number of mutations in the final maf instead of using this info in the sample_data.txt.

To be clear:

  1. Number_of_Mutations column in sample_data.txt will be 0 only gonna affect a small number of samples which only have 5 mutations or less. This need to be fixed by counting the final maf instead of using info from RunMutationSignatures.
  2. TMB calculation uses final maf, so it's correct for all samples. @arichards2564 @ShwetaCh
ShwetaCh commented 4 years ago

Thanks for tracking it @gongyixiao. But then, my two samples had 6 mutations each. Is the cutoff for mutational signatures set to 6?

gongyixiao commented 4 years ago

I guess the reason is in the error log above:

6 lines read; 4 SNPS counted, 0 SNPs skipped, 2 non-SNPs skipped

6 mutations total, 2 non-SNPs skipped.

ShwetaCh commented 4 years ago

@gongyixiao Got it, less than 5 SNPs specifically. In my case SNP+Indels is >=6. FYI @arichards2564

gongyixiao commented 4 years ago

Fixed here: https://github.com/mskcc/tempo/blob/0c37bc0f5bf38dca97aa70812c99b09dd9736a0f/containers/metadataparser/create_metadata_file.py#L205

Please verify: @ShwetaCh @arichards2564