Open yasharw opened 2 years ago
Hi Will, thanks for pointing out this discrepancy!
The short answer is: I need to update the identifiers, as they appear in this repository, to align with the final identifiers of the publication. I'll do this ASAP.
Longer answer: After a little investigation I found that these signature identifiers are described in our November 2018 bioarxiv version. Somehow they dropped out completely in the final published version in September 2019 (probably in an overzealous attempt to simplify the final figures). The explanation is that we analyzed the datasets one at a time, accumulating signatures as we went. First, in the TCGA-BRCA cohort we found 7 signatures 1, 2, 3, 4, 5, 6, 7. Then, in the higher-resolution METABRIC cohort, we found some of the same signatures (and hence did not use a new number for these), but also a few new ones, which we numbered 8, 9, and 10. Moreover some of the TCGA-BRCA signatures did not appear in the METABRIC: 5, 6, 7. So we considered 5, 6, 7 to be invalidated. The final list of interpretable signatures is, in the old terminology: 1, 2, 3, 4, 8, 9, 10. These we just renumbered 1 through 7. I found this table in my records (sadly it does not seem to be in any manuscript version's supplementary data):
More discussion of this appears in the "Refinement and validation with METABRIC dataset" section in the November 2018 bioarxiv version.
Thank you for the quick reply - particularly 3 years post publication!
Your explanation makes sense and is easy enough to fix in my analysis.
Hi all,
I'm trying to extract the TDA signatures that you generated for the TCGA PanCancer BRCA cohort. It seems like the
tcga_recurrence_pam50_vs_tdasignatures.csv
file should contain this information. However, this file contains signatures "_1" through "_10" and your manuscript only describes seven signatures. Did you fold three of these signatures into others? Is there another file that I should evaluate instead?Any clarification that you could provide would be appreciated.
Thank you! Will