A question about the number of mutations in the results

Elpalet commented 5 months ago

I was honored to be able to use mSighdp, which I successfully ran using test data and my own data. I noticed in the results that inferred.exposures.csv, the number of mutations assigned to each mutational sig was not an integer, and their sum was different from the number of mutations in my input file. Is this correct? Or should these mutations with no assigned features be classified as unknown mutational sigs? Thank you again for creating this .

Elpalet commented 5 months ago

But I don't see this in the sample data, I think there is something wrong with my running, but there is no warning or error

liumoLM commented 5 months ago

Hi,

This is possible. As you may notice, there are 'extracted.signatures.csv' and 'low.confidence.signatures.csv' in the result folder. During the extractions, some signatures with low confidence (i.e. only found in a few posterior samples) were found in the tumors. We removed these low confidence signatures from tumors and simply re-normalized the exposures of the rest, so there are decimals.

We don't recommend to use 'inferred.exposures.csv' as the final exposure result. Instead, I'd like to suggest to try other NMF-based signature assignment tools, e.g. mSigAct or SigProfilerAssignment.

Hope this helps. Feel free to leave questions if there is any.

Elpalet commented 5 months ago

Hi, This is possible. As you may notice, there are 'extracted.signatures.csv' and 'low.confidence.signatures.csv' in the result folder. During the extractions, some signatures with low confidence (i.e. only found in a few posterior samples) were found in the tumors. We removed these low confidence signatures from tumors and simply re-normalized the exposures of the rest, so We don't recommend to use 'inferred.exposures.csv' as the final exposure result. Instead, I'd like to suggest to try other NMF-based signature assignment tools, e.g. mSigAct or SigProfilerAssignment. Hope this helps. Feel free to leave questions if there is any.

Thank u very much for answering my question, I have thought that the reason is normalization, I use mSighdp to get similar sigs with other methods, I think it is a reliable software. So is there a way to cancel this normalization step ?

liumoLM commented 5 months ago

A simple way would be doing the re-normalization manually. For example, in a tumor, the total mutation count is N, and mSigHdp gives the exposure of two signatures with Ea and Eb. Then do N/(Ea+Eb), and multiply this factor with Ea and Eb, and remove the decimals. In this way you will get a new exposure result with the sum very close to the mutation count.

Meanwhile, I'll check the code to see what I can do to fix this issue. But this may take some time.

Feel free to drop me an email to liumolm@gmail.com if you'd like to dicuss about more details.

Thank you.

Elpalet commented 5 months ago

A simple way would be doing the re-normalization manually. For example, in a tumor, the total mutation count is N, and mSigHdp gives the exposure of two signatures with Ea and Eb. Then do N/(Ea+Eb), and multiply this factor with Ea and Eb, and remove the decimals. In this way you will get a new exposure result with the sum very close to the mutation count.

Meanwhile, I'll check the code to see what I can do to fix this issue. But this may take some time.

Feel free to drop me an email to liumolm@gmail.com if you'd like to dicuss about more details.

Thank you.

Thank u , I managed to get the mutation sigs distribution I wanted for reference. The sum of the re-normalized number of mutations differs from the input data by 1-2, but I think that's OK My problem has been solved, wish you a happy life

steverozen commented 5 months ago

Thank you, Mo Yes, mSigHdp does not provide good estimates of exposure. You can also see Nanhai's preprint: https://www.biorxiv.org/content/10.1101/2024.05.20.594967v1.abstract

On Tuesday, June 4th, 2024 at 9:39 PM, Mo Liu @.***> wrote:

Hi,

This is possible. As you may notice, there are 'extracted.signatures.csv' and 'low.confidence.signatures.csv' in the result folder. During the extractions, some signatures with low confidence (i.e. only found in a few posterior samples) were found in the tumors. We removed these low confidence signatures from tumors and simply re-normalized the exposures of the rest, so there are decimals.

We don't recommend to use 'inferred.exposures.csv' as the final exposure result. Instead, I'd like to suggest to try other NMF-based signature assignment tools, e.g. mSigAct or SigProfilerAssignment.

Hope this helps. Feel free to leave questions if there is any.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

steverozen / mSigHdp

A question about the number of mutations in the results #4