Normalisation to spesific TF context?

mortunco commented 7 years ago

Hi,

Thank you for contributing science with this package.

~~Is there a way that I can tweak randomisation to apply my case which I want to normalise tri-nucleotide context considering TF base context.~~

~~For example lets say CTCF. The sequence context of binding regions might not be same as whole genome so when we get the signature there will be bias towards to the enriched base.~~

~~Previously I have attempted to simulate random bed files that have similar context then get the signature. If it the random signatures are not same to the CTCFs then my signature from CTCF is true.~~

Any thoughts will be helpful.

Best regards,

Tunc.

EDIT: Sorry for asking before reading the article in a more detailed way. Please forget my previous comment. ( I strikedthrough). Let me rephrase my question. I found that I can supply custom tri-nucleotide context in to whichSignatures() function. For each transcription factor region, I will calculate tri-nucleotide context from their fasta and try to normalise based on it. This is going to be my method to normalise sequence context of TF regions. Is this right ? I think it is better than no normalisation.

However, I have a question related to "genome" normalisation. I get different mutation signature results when I use "genome" and "default" normalisation types. However, my data is a whole breast cancer data. (like a single whole genome patient data). Could you give me some information about the base of this normalisation. Why there is a "genome" option ? or in which situations I should use it ? In article for whole genome tumor samples, it is stated that there is no need for normalisation but, isn't it supposed to be give same result, even though I normalise to genome ?

To sum up, I am looking for if mutation signature changed after TF binding event occurs and I want to remove the effect of sequence context of TF binding regions.

Sorry for confusion.

jherrero commented 7 years ago

Hi Tunc

Please refer to Issue #2 for a discussion on the normalisation methods.

If you are interested in looking at the signatures of mutations on CTCF binding sites, you could indeed build a null of tri-nucleotide contexts based on all predicted CTCF binding sites. I would however warn you about the possibility of having some contexts hugely underrepresented in the CTCF binding sites, which might return odd results. We haven't really tested decontructSigs for very small genomic regions and we cannot guarantee that it would work fine, but I think it is a good idea and it would be great if you were willing to test it and report back.

Best wishes

Javier

mortunco commented 7 years ago

Dear @jherrero

Thank you for your rapid response. I have seen that you guys have been answering all the questions extensively so I grateful for that.

I will be more than happy to share the results of the TF proximity mutation signature changes but I am having a problem with understand normalisation method.

I totally understood that I should use default method for WGS mutation signature data.

1) Before I ask everything, could you give me a source or brienfly explain how do you "normalise" mutation signature values to a sequence context. Like do you simple divide your trinucleotide proportions in to the context ? I couldnt find an explicit calculation regarding this.

2) I created the fasta of a TF bed file. Than merged in to a single string and used the following code to extract trinucleotide sequence.

a=readDNAStringSet('CTCF.fasta',format = 'fasta')
sequence_context <- t(trinucleotideFrequency(a))

Then I used the following code to calculate contribution.

plot_example = whichSignatures(tumor.ref = sigs.input,signatures.ref = signatures.cosmic,
                               sample.id = "CTCF.vcf",contexts.needed = T,
                               tri.counts.method = sequence_context)
plotSignatures(plot_example, sub = 'CTCF_normalised2sequencecontext')

Is the his method right ? Like in to exome2geome ( or vice versa) should make my normalisation with respect to another object like genome? If so how ?

If everything is above is right. I want to report that, with normalised method I obtained .113 error whereas in default method I have the error of .095

This step is very critical for my graduate thesis so I am willing to give all sorts of information related to context. I just dont know what do give so please ask me :)

Thank you for your help,

Best regards.

Tunc.

raerose01 / deconstructSigs

Normalisation to spesific TF context? #17