havakv / pycox

Survival analysis with PyTorch
BSD 2-Clause "Simplified" License
823 stars 191 forks source link

METABRIC Covariates Subset #161

Open sourcesync opened 1 year ago

sourcesync commented 1 year ago

Apologies for the newbie question. It seems that the original METABRIC dataset has many more "factors" than the 8 covariates in your dataset.

Which "factors" did you choose for the version of the dataset available in this package?

Thanks.

sourcesync commented 1 year ago

I found it in this paper ( see quote below if anyone else is interested. ). You can close this issue.

METABRIC: The Molecular Taxonomy of Breast Cancer International Consortium
(METABRIC) is a clinical dataset which consists of gene expressions used to determine different subgroups of breast cancer. We consider the data for 1,904 patients
with each patient having 9 covariates - 4 gene indicators (MKI67, EGFR, PGR, and
ERBB2) and 5 clinical features (hormone treatment indicator, radiotherapy indicator,
chemotherapy indicator, ER-positive indicator, age at diagnosis). Furthermore, out
of the total 1,904 patients, 801 (42.06%) are right-censored, and the rest are deceased
(event). We obtained the DAG as depicted in Fig. 3 using a modified DAG-GNN
algorithm.