jimmymathews / tda-pam50-data

0 stars 0 forks source link

TCGA PanCancer TDA Signatures #1

Open yasharw opened 2 years ago

yasharw commented 2 years ago

Hi all,

I'm trying to extract the TDA signatures that you generated for the TCGA PanCancer BRCA cohort. It seems like the tcga_recurrence_pam50_vs_tdasignatures.csv file should contain this information. However, this file contains signatures "_1" through "_10" and your manuscript only describes seven signatures. Did you fold three of these signatures into others? Is there another file that I should evaluate instead?

Any clarification that you could provide would be appreciated.

Thank you! Will

jimmymathews commented 2 years ago

Hi Will, thanks for pointing out this discrepancy!

The short answer is: I need to update the identifiers, as they appear in this repository, to align with the final identifiers of the publication. I'll do this ASAP.

Longer answer: After a little investigation I found that these signature identifiers are described in our November 2018 bioarxiv version. Somehow they dropped out completely in the final published version in September 2019 (probably in an overzealous attempt to simplify the final figures). The explanation is that we analyzed the datasets one at a time, accumulating signatures as we went. First, in the TCGA-BRCA cohort we found 7 signatures 1, 2, 3, 4, 5, 6, 7. Then, in the higher-resolution METABRIC cohort, we found some of the same signatures (and hence did not use a new number for these), but also a few new ones, which we numbered 8, 9, and 10. Moreover some of the TCGA-BRCA signatures did not appear in the METABRIC: 5, 6, 7. So we considered 5, 6, 7 to be invalidated. The final list of interpretable signatures is, in the old terminology: 1, 2, 3, 4, 8, 9, 10. These we just renumbered 1 through 7. I found this table in my records (sadly it does not seem to be in any manuscript version's supplementary data): signatures_tcga_metabric_gtex

jimmymathews commented 2 years ago

More discussion of this appears in the "Refinement and validation with METABRIC dataset" section in the November 2018 bioarxiv version.

yasharw commented 2 years ago

Thank you for the quick reply - particularly 3 years post publication!

Your explanation makes sense and is easy enough to fix in my analysis.