Open OMIC-coding opened 11 months ago
Hi liziyu, indeed, I extracted the features with the pipeline of Jakob Kather's lab. You can access it here: https://github.com/KatherLab/marugoto/ Feature extraction with CTransPath is implemented in the branch feature_extraction
Alternatively, you can use the branch feature_extraction from this repository :)
Let me know if you encounter any issues!
Sophia
@sophiajw and @ValentinKoch, thank you for your great work and this detailed repo. I have one conceptual and one practical question regarding the combination of tile augmentation and the CtransPath feature extractor. In the paper referred to in this repo, you describe these steps:
...To reduce the impact of the staining color on the model generalization, the tiles are stain-color augmented using a structure-preserving GAN trained on TCGA.35 We extract feature representations of dimension 768 for every tile using the CTransPath model.29...
Did you notice an improvement in performance when performing stain augmentation before feeding the tile into CTransPath? CTransPath authors aim to design an SSL training system with augmentations that encourage learning features from the relevant content of tiles rather than color attributes, etc. Also, they trained their model on an extremely large and diverse dataset. As such, I would expect it to address stain variations between sites.
I'm aware that the code in the feature_extraction branch might not exactly reflect the preprocessing pipeline described in the paper. Yet, I didn't find the stain augmentation part in the code. You mentioned your previous HistAuGAN work, but I was curious to see how it is implemented here. Could you please point me to the part in the code corresponding to stain augmentation? In the torch transformations of CTranspath in your code, I see only the Resize
and Normalize
ones: https://github.com/peng-lab/HistoBistro/blob/17fd799f058bcf02d79d031a87bde9006cf615a3/models/model.py#L148-L152
hey @yuvfried, thank you!
Hope this helps!
2. https://github.com/KatherLab/marugoto/blob/feature_extraction/marugoto/extract/extract.py#L128
The extract.py file attached to this link appears to be invalid or has been removed. Could you please provide an updated version for it?
Hi, sophiajw! Could you help me with that issue I mentioned before? Thanks~
hey @liziyu-000, thanks for following up!
I added the feature extraction with HistAuGAN now to this repo. You find it in the branch feature_extraction
. Just enable the augmented features by setting the flag --histaugan
. You can download the checkpoint of the trained model here. It was trained on patches of the 7 largest submission sites of the TCGA cohorts COAD and READ.
Hi, sophiajw! I want to know if you conducted self-supervised learning on your own dataset before training your transformer model, or you just use the CTransPath with fixed weights to extract features from individual tiles. Thanks~
Very seminal work and detailed code for step 3 in your whole pipeline. However, codes for feature extraction and imaging data preprocessing are missing. For example, there is no description about how the h5 feature file was generated for each cohort. The results cannot be reproduced without these codes, even though the random seed was given. Could you please upload these codes. Looking forward to your reply!