Change in forward pass for CHIEF

andrewsong90 commented 1 month ago

Dear authors,

Thank you for the great work! We were benchmarking CHIEF and we noticed that the CHIEF forward pass was changed few weeks ago Below is the change log.

image (12)

While it does lead to performance improvement, we are quite confused as to what is going on here. How is the original implementation different from this new one? Which one of these implementations is the one introduced in the main manuscript? Why did you have to make this change?

Let us know,

### Tasks

Xiyue-Wang commented 1 month ago

Hi, Andrew Song, thank you so much for including CHIEF in your benchmarking work. Looking forward to seeing the outcomes. We’d like to clarify that this change highlights a simpler way of using CHIEF as a WSI feature extractor and does not affect the inference process or results presented in our CHIEF paper.

As the demand for using our WSI module continues to grow, several users have reached out to us regarding simpler ways of extracting generalized and less saturated frozen features from CHIEF for other creative downstream tasks beyond our investigations (such as unsupervised clustering). Therefore, we provided a more straightforward approach for users to extract and use frozen WSI features directly.

CHIEF generates attention scores for WSI patches, and slide-level features can be obtained by multiplying these scores with either the original patch features from CTransPath (OPTION 1) or the CTransPath features further projected through a fully connected layer (fc = [nn.Linear(size[0], size[1]), nn.ReLU()])(OPTION 2). Since the weakly supervised pretraining process is designed for cancer detection (cancer vs. non-cancer), in the downstream application, passing the original patch-level features (h_ori) through the frozen fully connected layer to obtain features (h) may make the features more specific to cancer detection. If using frozen features, OPTION 2 can obtain WSI features tailored more for task 1 (cancer detection) presented in our paper. In the CHIEF manuscript, since the cancer detection task aligns with CHIEF's pretraining, the WSI features used during inference are obtained by multiplying the attention scores with the projected CTransPath features (i.e., CTranspath features pass through a fully connected layer).

If users intend to apply our features across different tasks and minimize computation, the frozen features from OPTION 1 can be used directly. Please let us know if you have any questions.

guillaumejaume commented 1 month ago

Hi @Xiyue-Wang, thanks for providing more information. OPTION2 uses embeddings post-ReLU, where by definition, a significant part of the signal is lost. I'm not sure I understand why this would be a good option. I believe this was the original implementation before the introduction of OPTION 1 about 3 weeks ago. Would be happy to get your input on this. Thanks!

Dadatata-JZ commented 1 month ago

Hi @guillaumejaume, thanks for the follow-up. Option 2 is aligned with the pre-training task (I.e, cancerous detection), which was presented as a quick demonstration in our initial code push. Therefore, we call it's tailored to this task. If one's application is cancerous detection or close to this, option 2 (also freezing the layers for extracting feature representations) will be preferred for its specificity.

andrewsong90 commented 1 month ago

Hi @Xiyue-Wang and @Dadatata-JZ ,

Thank you for your quick response. We did observe that Option 1 is indeed giving better downstream results than Option 2, off-the-shelf. However, when we tried finetuning Option 1 for each specific downstream task, the performance seemed to be quite low (even lower than the linear probe) - We've tried few different hyperparameter settings, but weren't successful.

The provided finetuning scripts are for Option 2 - Can you verify and provide finetuning recipes for Option 1 as well? Thanks!

Dadatata-JZ commented 1 month ago

@andrewsong90 No problem at all. Anytime! Thank you indeed for the great questions and discussions.

Quick question: are you primarily tuning these models for binary classification tasks? When validating the backbones, does your experimental design for benchmarking allow for 'full fine-tuning' or 'repurposing'? If it’s repurposing, how many layers are typically allowed to update? My personal understanding is that some backbones are slow and expensive to tune due to their complexity, but for lightweight models, it may be different since each layer may have to contribute and "capture" more task-specific information. Thanks

andrewsong90 commented 1 month ago

Hi @Dadatata-JZ,

Apologies for the late reply. We are thinking of full-finetuning of the slide encoder, just as it is done for the finetuning script provided for Option2. We are testing it for multi-class classification problems (C>10).

Since the forward pass changed, I was wondering if you had tried finetuning with the new forward pass and whether finetuning recipe changes as a result.

Thanks!

Dadatata-JZ commented 1 month ago

Hi @andrewsong90

Yeah. We performed full fine-tuning for the WSI module (MIL) as well. Most of our ongoing or archived downstream tasks are actually less than ten classification with enough samples (not few-shot or zero-shot etc). We replaced the projection head from two nodes to multiple ones (or kept it as two for binary) and fine-tuned the entire block. Since the neurons in the projection layer were already aligned with our target classes, we used the outputs directly without going back to the prior layers for high-dimensional representations.

Lmk if any. Cheers.

tranh215980 commented 1 month ago

Dear @Dadatata-JZ ,

I am using CHIEF and other models past month and now more confused. Is new forward pass how CHIEF evaluated in paper can you and can you clarify how CHIEF evaluated and going on? All of my results were done with old codes.

Dadatata-JZ commented 1 month ago

Hi @tranh215980 If your evaluations focus on cancer detection, you should stick with the final layer for features. However, if you plan to utilize CHIEF pre-trained weights without fine-tuning, you could explore using this highlighted layer in our recent push. Please note, the model weights are never changed. As the discussions above, the last layer is more tailored to the task of cancerous detection. We emphasize this point because we recently realize that many fellows, like yourself, are interested in freezing CHIEF for feature extraction at the WSI level, and we don't want you to miss out this option. Lmk? Cheers,

hms-dbmi / CHIEF

Change in forward pass for CHIEF #29