Closed WinterPan2017 closed 4 months ago
ok, data has been uploaded in WSI_FiVE/gpt_preprocess/luad_tcga_pub_clinical_data.tsv
Thanks for reply. I noticed that there are 28 cases not included in DSMIL's pre-computed features (listed below), which might result in a mismatch in the test set. As illustrated in the table below, we count the number of slides in each subtypes (each sample may have multiple slides). Is there any issue with this?
Sample ID
TCGA-05-4384-01
TCGA-05-4389-01
TCGA-05-4410-01
TCGA-05-4425-01
TCGA-05-5420-01
TCGA-05-5423-01
TCGA-05-5715-01
TCGA-49-4486-01
TCGA-50-5049-01
TCGA-50-5051-01
TCGA-50-5072-01
TCGA-50-5932-01
TCGA-50-5933-01
TCGA-50-5935-01
TCGA-50-5936-01
TCGA-50-5941-01
TCGA-50-5944-01
TCGA-50-6595-01
TCGA-78-7143-01
TCGA-78-7162-01
TCGA-91-6828-01
TCGA-91-6829-01
TCGA-91-6835-01
TCGA-91-6840-01
TCGA-91-6847-01
TCGA-91-6849-01
TCGA-91-7771-01
subtype name | sample (paper) | sample | slide |
---|---|---|---|
Adenocarcinoma Mixed Subtype | 54 | 47 | 51 |
Bronchioloalveolar Carcinoma Nonmucinous | 10 | 8 | 8 |
Acinar Adenocarcinoma | 5 | 4 | 4 |
Mucinous Adenocarcinoma | 1 | 1 | 1 |
Micropapillary (colloid) Adenocarcinoma | 4 | 3 | 7 |
Bronchioloalveolar Carcinoma Mucinous | 3 | 3 | 3 |
Micropapillary Adenocarcinoma | 3 | 2 | 2 |
Papillary Adenocarcinoma | 8 | 8 | 17 |
Not Otherwise Specified (NOS) | 136 | 120 | 158 |
NA | 7 | 7 | 7 |
The data is obtained from a public database, the link is provided in the paper. As of now, all data with subtype labels are contained here. There is no way to calculate the data that is not included in DSMIL's pre-computed features. You need to remove this part of the data. Or you can also follow the DSMIL method to train a model to solve this problem.
Appreciate your patience. So, the reported results are based on your own pre-computed features which contain all samples in the provided file, rather than DSMIL's pre-computed features.
The other experiments utilized DSMIL features. Due to the lack of data in this area, we trained these features ourselves.
I noticed that the missing samples are not included in the pretraining-related CSV files, e.g. : LUAD_LUSC_data_desc_reid_delval.csv, TCGA_Reports.csv. Is it correct to say that only the Zero-Shot Histological Subtype Classification experiment is based on your own features, while the pretraining and other downstream tasks utilize DSMIL features?
yes
Thanks a lot.
Hello, could you please share the id-subtype pairs used in the Zero-Shot Histological Subtype Classification?