hms-dbmi / CHIEF

Clinical Histopathology Imaging Evaluation Foundation Model
GNU Affero General Public License v3.0
135 stars 17 forks source link

How to get slide-level embedding using several tile-embeddings? #30

Open lmxmercy opened 6 days ago

lmxmercy commented 6 days ago

Hello, xiyue! Firstly I want to say congratulations to you for your a series of amazing works recent years (actually, I am a big fan of you). So here is the problem: Due to the challenge of my custom data, I need to divide the orginal WSI to thousands of patches first, then perform feature extraction on these histology patches. I found CHIEF can extract the feature embeddings from the tiles, but how can I achieve "feature extraction on tiles then combine these tile-based embeddings to a slide-level feature embedding (a .pt file)". Your codebase provides the related scripts, but it seems to be some steps are missing to achieve this operation. Waiting for your response sincerely, thank you! Mingxin

Dadatata-JZ commented 5 days ago

@lmxmercy Hi there, For the question, once you extract the tile-level features using CtransPath and save them in a pt file (e.g., shaped in [n, 768], where n is the tile number), you can use this script we provide in https://github.com/hms-dbmi/CHIEF/blob/main/Get_CHIEF_WSI_level_feature.py for generating one WSI representation (a feature vector). You should be able to find a field called "WSI_feature" in your return. Note that it is used when you freeze (not fine tune) the model.

Besides the docker image, the model weights can be downloaded from https://drive.google.com/drive/u/1/folders/1uRv9A1HuTW5m_pJoyMzdN31bE1i-tDaV

Lmk if any. Looking forward to learning the performance on your customized data. It is exciting.

PS: I just pinged @Xiyue-Wang to make sure your kind words are shared. ;)

lmxmercy commented 5 days ago

Thank you so much for your in-time reply! But.. I still cannot figure it out. Cause in normal ways to process the feature extraction of WSI like CLAM, they always create the patches by saving the patch features and correspongding coordinates inofrmation into a .h5 file, then extract features into a .pt file. Does CHIEF using the same or similar way? Currently, I don't know how to generate the h5 file (features w/ coords) and I didn't find the related code in CHIEF codebase. Or if I can generate the tile embeddings and then combine and save them into a .pt file directly which can represent the WSI-level embedding without coordibates information. I'm confused :(

Dadatata-JZ commented 5 days ago

@lmxmercy Mingxin, no worries at all. I see your confusion.

You only need to feed feature matrices (creating them via CLAM is perfectly fine) into the forward func. The concurrent coordinates can be used to trace the corresponding geolocation for a given feature vector in the matrix by matching the indices, though this does not impact the inference process.

lmxmercy commented 5 days ago

Actually, I don't know the meaning of "coords" information in .h5 file. Does it really matters for slide-level tasks? Currently, I have divided a WSI into several patches first using third-party methods (like histolab, slideflow, etc.), cause I need to perform quality control to these patches (the quality of some patches is really bad). And then I use get_patch_feature.py by CHIEF to extract the tile embedding of one patch, I plan to wirte a script to get all the tile embeddings of a WSI (after quality control, some patches has been exclued) then combine and save them into a .pt file without ant coordinate information in this procedure. I don't know if my idea is correct :( And really thanks for your help and sorry to taken your time.

Dadatata-JZ commented 5 days ago

@lmxmercy Mingxin, no need to apologize at all. We try our best to answer questions from credible communities. It is all about sharing and learning together.

IMO, tile processing and selection can be flexible. I myself am also trying different tiling methods for different projects(w/, w/o overlapping etc). Once you get your tiles, you can feed them into CtransPath to obtain feature vectors then stack the outputs as a feature matrix before send to our WSI feature extractor. I know some models may require coordinates for extracting WSI-level feature but it is not required for CHIEF.

lmxmercy commented 5 days ago

Thank you so much! I will working on it and write the script to perform this procedure, and if it works well I can share the code for more flexible usage for CHIEF.