szc19990412 / TransMIL

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification
325 stars 72 forks source link

How the performance of baseline methods are obtained? #8

Closed Lafite-Yu closed 1 year ago

Lafite-Yu commented 2 years ago

As the dataset used in this paper is different from the compared baseline methods, how their performance results are obtained? Did you train and eval the baseline models respectively? Thanks.

ZacharyWang-007 commented 2 years ago

As the dataset used in this paper is different from the compared baseline methods, how their performance results are obtained? Did you train and eval the baseline models respectively? Thanks.

I have the same question. I email him and he didn't reply.

Lafite-Yu commented 2 years ago

As the dataset used in this paper is different from the compared baseline methods, how their performance results are obtained? Did you train and eval the baseline models respectively? Thanks.

I have the same question. I email him and he didn't reply.

The performance of other baseline methods reported in the paper is lower than metrics in the original papers, I wonder how the authors acquired such results. However, the reported performance of TransMIL is still higher than the performance of the baseline methods in their own papers.

marvinyan080 commented 1 year ago

As the dataset used in this paper is different from the compared baseline methods, how their performance results are obtained? Did you train and eval the baseline models respectively? Thanks.

I have the same question. I email him and he didn't reply.

Hi, did you get response from them? I cannot reproduce their results, but for the methods they compared I can reproduce and those methods have higher performance then what they reported in the paper.

szc19990412 commented 1 year ago

For the comparison methods, except MIL-RNN, all methods use 1024-dimensional features extracted by ImageNet pre-trained ResNet50, and all comparison schemes use the same training framework pytorch lightning and the same training parameters, such as learning rate, etc. The parameters reported in the comparative method paper depend on the specific parameters used in the experiment, and unfortunately under the same experimental conditions as ours, better results were not achieved.

Lafite-Yu commented 1 year ago

For the comparison methods, except MIL-RNN, all methods use 1024-dimensional features extracted by ImageNet pre-trained ResNet50, and all comparison schemes use the same training framework pytorch lightning and the same training parameters, such as learning rate, etc. The parameters reported in the comparative method paper depend on the specific parameters used in the experiment, and unfortunately under the same experimental conditions as ours, better results were not achieved.

I think this should not be a fair comparison for other models, may I understand in the following way: you carefully find the best hyper-params setting for your model, while omitting the best setting for other models when reproducing their results?