kgori / sigfit

Flexible Bayesian inference of mutational signatures
GNU General Public License v3.0
33 stars 8 forks source link

Similar signatures - inconsistent extraction #65

Closed gevro closed 2 years ago

gevro commented 2 years ago

Hi, I'm encountering some inconsistent de novo signature extraction results. Some sets of samples extract SBS3, others SBS5, others SBS40, even though all of them have fairly similar full spectra.

I did cosine similarity of cosmic v3.2 vs itself Rplot.pdf, and it looks like there are many clusters signatures that are highly similar to each other. Is there a smart way for sigfit to handle this? This suggests to me these clusters are not truly separate signatures, or they are all 'contaminated' by some other signature. Thanks.

kgori commented 2 years ago

Hi,

Let me check I understand. It sounds like you are running a de novo signature analysis, with sigfit::extract_signatures, and comparing the resulting signatures with COSMIC to see which are the closest matches to your results. Is this right? Or are you fitting all of the COSMIC signatures against your data, with sigfit::fit_signatures, and you find that each run emphasises a different set of signatures?

Either way, you are right to notice that several of the COSMIC signatures are quite similar to each other. It's a bit of a drawback of the COSMIC signatures. As far as I know, the COSMIC signature extraction method doesn't try to generate maximally distinct signatures, like for example a PCA would, so there is some overlap or cross-fitting. Our fitting models in sigfit currently also don't have any way to prioritise fitting distinct signatures, so I'm afraid I can't offer you any smart solution. Our usual approach is to fit a restricted set of signatures instead of the entire COSMIC set (either by excluding them entirely, or by down-weighting them through the signatures prior). Which signatures to use would depend on the expectations you have based on your data.

Cheers, Kevin

gevro commented 2 years ago

Thank you very much for the quick reply. I am doing de novo extraction with extract_signatures. Then comparing to cosmic. And I’m finding it difficult to get consistent results in samples that should be consistent, specifically due to signatures that are very similar to each other.

Is there a different catalog you recommend? Or is there a specific accepted set of priors for cosmic that helps adjust for this? Or an established way to collapse similar signatures? I could try to write this myself but don’t want to reinvent the wheel.

kgori commented 2 years ago

Hi, To the best of my knowledge there's no other catalogue of signatures of comparable ambition; COSMIC is pretty much the gold standard. Validation of mutational signatures remains an active research area. I remember that Serena Nik-Zainal and her group were looking into matching up signatures with biochemical processes. There's a review of the current thinking here. (I expect there's other research available). As far as sigfit goes, I think it's a good tool for separating the signals present in data, but we are not affiliated with COSMIC, so I can't really give you a definitive answer regarding interpretation of the results in that context. Kevin

gevro commented 2 years ago

Thank you very much for the feedback!