mahmoodlab / HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)
Other
514 stars 90 forks source link

Duration of Feature Extraction Using hipt4k Pretrained Model #66

Closed XinyangHan closed 11 months ago

XinyangHan commented 1 year ago

Thanks for your excellent work. I’m about to replicate your weakly-supervised experiment and assume I need to start with feature extraction using hipt4k pretrained model. In my initial tests, extracting ViT features seems lengthy, possibly taking over a week. Could you share how long this step took in your work, and the setup you used?

Richarizardd commented 11 months ago

Hi @XinyangHan - sharing my answer publicly as well for others to see.

Regarding this point, we do not extract HIPT features with batch sizes greater than 1 (which you may have been doing). Rather, we patch + extract features from $4096^2$ px images with a batch size == 1, followed by reshaping the $4096^2$ px image into $256 \times 256 \times 256 \times 3$ (effectively a minibatch of 256 images of size $256^2$ px). From my experimentation in extracting features from HIPT in this manner, this should have slightly slower but similar computing time as 256-sized feature extraction. To sanity check - TCGA-3C-AALK-01Z-00-DX1.4E6EB156-BB19-410F-878F-FC0EA7BD0B53 with 4K patching yields ~117 4K patches in this slide and takes around 190 seconds (see the provided segmentation below).

TCGA-3C-AALI-01Z-00-DX1 F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291_maskstitch

One of the limitations of this work is that 4K patching for some slide can be difficult using the four_pt contour function in CLAM, which was developed for $256^2$ px images. Thus, certain cohorts (as described in the README) exclude WSIs. To make HIPT comparable with splits used by other papers, I would suggest trying to modify the four_pt contour function to be less conservative and make sure that it at least includes 1 4K region per slide. For example, a biopsy slide (like the one shown below) may not be patched correctly and have any detected 4K patches using the current four_pt contour function (there are many slides in TCGA that are just biopsy fragments and have very little tissue content). Using the exact splits that I evaluate is also OK, but if you are also evaluating on your own splits or custom dataset, I would try and relax the contour function.

unnamed

vildesboe commented 11 months ago

Thanks for your excellent work. I’m about to replicate your weakly-supervised experiment and assume I need to start with feature extraction using hipt4k pretrained model. In my initial tests, extracting ViT features seems lengthy, possibly taking over a week. Could you share how long this step took in your work, and the setup you used?

Hi! I am also reproducing this experiment. Would you want to discuss some results/progress?

XinyangHan commented 11 months ago

Yeah! Sure! Glad to chat!

vildesboe commented 11 months ago

Yeah! Sure! Glad to chat!

Cool! Do you have an email address maybe? (/ my email is written on my profile)