Closed ShwetaCh closed 4 years ago
Good catch and thank you for reporting this.
The reason for the error the following:
Number_of_Mutations
column is generated by this lines in process MetaDataParser
:
https://github.com/mskcc/tempo/blob/94e134accffc65dd5b8de834c59d24d40438342e/containers/metadataparser/create_metadata_file.py#L121-L126
Which means it is not actually counting number of mutations in the maf file, but using Number of Mutations
column in mutational signature output *.mutsig.txt
from process RunMutationSignatures
When there are less than 5 mutations in the input maf for process RunMutationSignatures
, the program will not generate output and give an error like below:
[gongy@terra 757]$ cat megatron_Jan5th/work/ac/2c2c7ff67387ffc9585cb0a12e5e0d/.command.out
Loading known signatures from /mutation-signatures/Stratton_signatures30.txt
Making sample signatures from maf s_C_F624M0_P001_d__s_C_F624M0_N001_d.somatic.maf
6 lines read; 4 SNPS counted, 0 SNPs skipped, 2 non-SNPs skipped
Decomposing signatures and writing to s_C_F624M0_P001_d__s_C_F624M0_N001_d.mutsig.txt
0/1 decomposed
Warning: sample has less than 5 mutations; cancelling decomposition
Sample s_C_F624M0_P001_d not decomposed
So, the solution will be not using above info but instead count number of mutations based on https://github.com/mskcc/tempo/blob/94e134accffc65dd5b8de834c59d24d40438342e/containers/metadataparser/create_metadata_file.py#L63
For now, I suggest analysts to count the number of mutations in the final maf instead of using this info in the sample_data.txt
.
To be clear:
Number_of_Mutations
column in sample_data.txt
will be 0 only gonna affect a small number of samples which only have 5 mutations or less. This need to be fixed by counting the final maf instead of using info from RunMutationSignatures
.TMB
calculation uses final maf, so it's correct for all samples.
@arichards2564 @ShwetaCh Thanks for tracking it @gongyixiao. But then, my two samples had 6 mutations each. Is the cutoff for mutational signatures set to 6?
I guess the reason is in the error log above:
6 lines read; 4 SNPS counted, 0 SNPs skipped, 2 non-SNPs skipped
6 mutations total, 2 non-SNPs skipped.
@gongyixiao Got it, less than 5 SNPs specifically. In my case SNP+Indels is >=6. FYI @arichards2564
Please verify: @ShwetaCh @arichards2564
I found two samples that have a non-zero TMB but have a "blank" in the column "Number of mutations" (when the mut_somatic.maf does have 6 mutations each). So this may not be a bug necessarily depending on what is being reported.
s_C_F624M0_P001_ds_C_F624M0_N001_d TMB - 0.03 s_C_001553_P001_ds_C_001553_N001_d TMB - 0.08
[chavans@juno ~]$ grep "s_C_F624M0_P001_d" /juno/work/ccs/chavans/tempo_megatron/Result/cohort_level/mut_somatic.maf | wc -l 6 [chavans@juno ~]$ grep "s_C_001553_P001_d" /juno/work/ccs/chavans/tempo_megatron/Result/cohort_level/mut_somatic.maf | wc -l 6