Slide prediction in Camelyon16

szc19990412 commented 2 years ago

Hello, thanks for your amazing work!In the process of reproducing the results of the paper, I encountered some problems, which I hope can be replyed. For the slide prediction in Camelyon16, I didn't find code on how to predict from heatmap to slide level. According to the paper, I refer to the code here: For a Slide, I extracted 28 features based on the heatmap, and then fed into the random forest for training, but did not get a good result. So there will be some tricks to train the RandomForestClassifier？If you can open source the code for this part, I believe it will be of great help! Looking forward to your reply！

szc19990412 commented 2 years ago

I used the basic RandomForestClassifier model and found that the performance of the model on the test set was not good

clf=RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=42)
clf.fit(X,y) # train

y_train_pred=clf.predict(X)
y_test_pred=clf.predict(X_test)

from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Train Accuracy:",metrics.accuracy_score(y, y_train_pred))
print("Test Accuracy:",metrics.accuracy_score(y_test, y_test_pred))

Train Accuracy: 1.0
Test Accuracy: 0.6363636363636364

srinidhiPY commented 2 years ago

Hello, thanks for contacting.

In this particular work, we adopted the same post-processing steps as Wang et al. (2016) to obtain the slide-level predictions. Here we used a SVM classifier instead of a Random Forest that is trained using geometrical features reported in Wang et al. 2016.

In your experiments, how are you training this RF classifier ? What is your backbone patch-level classifier that is used for tumour probability heat-map generation ?

On Sat, Mar 12, 2022 at 9:40 AM szc19990412 @.***> wrote:

Hello, thanks for your amazing work!In the process of reproducing the results of the paper, I encountered some problems, which I hope can be replyed. For the slide prediction in Camelyon16, I didn't find code on how to predict from heatmap to slide level. According to the paper, I refer to the code here: https://github.com/3dimaging/DeepLearningCamelyon/tree/master/4%20-%20Prediction%20and%20Evaluation/Evaluation For a Slide, I extracted 28 features based on the heatmap, and then fed into the random forest for training, but did not get a good result. So there will be some tricks to train the RandomForestClassifier？If you can open source the code for this part, I believe it will be of great help! Looking forward to your reply！

— Reply to this email directly, view it on GitHub https://github.com/srinidhiPY/SSL_CR_Histo/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4AXU4N4K2LTGQZTUPKS43U7SUELANCNFSM5QR7AWOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

szc19990412 commented 2 years ago

Hi, thanks for your reply! For the patch-level classifier, I used a vit-tiny model based on the ImageNet pretrained weight. I used the SVM classifier, but again I didn't get good results. Maybe it's because I'm not taking self-supervised weights? Here is the SVM training code:

import numpy as np
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import pandas as pd 

data = pd.read_csv('/data111/shaozc/Camelyon16/data_sheet_for_random_forest.csv', index_col=0)
#---->split train/test
data.index = data['name']
data['name'] = data['name'].map(lambda x: x.split('_')[0]) 
mask = data['name']=='test'
df_test = data.loc[mask]
df_test.reset_index(inplace=True, drop=True)
df_train = data.loc[~mask]
df_train.reset_index(inplace=True, drop=True)

X_train = df_train.iloc[:,2:].values
y_train = df_train.iloc[:,1].values
X_test = df_test.iloc[:,2:].values
y_test = df_test.iloc[:,1].values
clf = make_pipeline(StandardScaler(), SVC(gamma='auto', probability=True, random_state=42))
clf.fit(X_train, y_train)

from sklearn import metrics
print("Train Accuracy:",metrics.accuracy_score(y_train, clf.predict(X_train)))
print("Test Accuracy:",metrics.accuracy_score(y_test, clf.predict(X_test)))
print("Train Auc:",metrics.roc_auc_score(y_train, clf.predict_proba(X_train)[:, 1]))
print("Test Auc:",metrics.roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1]))

Here is the csv file : data_sheet_for_random_forest.csv

szc19990412 commented 2 years ago

By the way, for the third step of semi-supervised training, I think there is a problem here.

Because train_tumor_idx include all training indices, unlabeled_train_idx should be separated from tumor_labeled_train_idx.

 tumor_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_tumor_idx)-set(tumor_labeled_train_idx)))
 normal_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_normal_idx)-set(normal_labeled_train_idx)))

srinidhiPY commented 2 years ago

Hi there, this has already been clarified in our paper in Section 3.3, which I quote here

On Sun, Mar 13, 2022 at 7:59 AM szc19990412 @.***> wrote:

By the way, for the third step of semi-supervised training, I think there is a problem here. [image: image] https://user-images.githubusercontent.com/58326452/158058190-e8bd57a8-fe81-4786-8989-6babb2470eb3.png

Because train_tumor_idx include all training indices, unlabeled_train_idx should be separated from tumor_labeled_train_idx.

tumor_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_tumor_idx)-set(tumor_labeled_train_idx))) normal_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_normal_idx)-set(normal_labeled_train_idx)))

— Reply to this email directly, view it on GitHub https://github.com/srinidhiPY/SSL_CR_Histo/issues/4#issuecomment-1066086098, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4AXUYOEGBWQNJRCPMPIPDU7XKDHANCNFSM5QR7AWOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

hust-linyi commented 2 years ago

Hi, thanks for your reply! For the patch-level classifier, I used a vit-tiny model based on the ImageNet pretrained weight. I used the SVM classifier, but again I didn't get good results. Maybe it's because I'm not taking self-supervised weights? Here is the SVM training code:

import numpy as np
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import pandas as pd 

data = pd.read_csv('/data111/shaozc/Camelyon16/data_sheet_for_random_forest.csv', index_col=0)
#---->split train/test
data.index = data['name']
data['name'] = data['name'].map(lambda x: x.split('_')[0]) 
mask = data['name']=='test'
df_test = data.loc[mask]
df_test.reset_index(inplace=True, drop=True)
df_train = data.loc[~mask]
df_train.reset_index(inplace=True, drop=True)

X_train = df_train.iloc[:,2:].values
y_train = df_train.iloc[:,1].values
X_test = df_test.iloc[:,2:].values
y_test = df_test.iloc[:,1].values
clf = make_pipeline(StandardScaler(), SVC(gamma='auto', probability=True, random_state=42))
clf.fit(X_train, y_train)

from sklearn import metrics
print("Train Accuracy:",metrics.accuracy_score(y_train, clf.predict(X_train)))
print("Test Accuracy:",metrics.accuracy_score(y_test, clf.predict(X_test)))
print("Train Auc:",metrics.roc_auc_score(y_train, clf.predict_proba(X_train)[:, 1]))
print("Test Auc:",metrics.roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1]))

Here is the csv file : data_sheet_for_random_forest.csv

Have you solved this issue?

srinidhiPY / SSL_CR_Histo

Slide prediction in Camelyon16 #4