Closed szc19990412 closed 2 years ago
I used the basic RandomForestClassifier model and found that the performance of the model on the test set was not good
clf=RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=42)
clf.fit(X,y) # train
y_train_pred=clf.predict(X)
y_test_pred=clf.predict(X_test)
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Train Accuracy:",metrics.accuracy_score(y, y_train_pred))
print("Test Accuracy:",metrics.accuracy_score(y_test, y_test_pred))
Train Accuracy: 1.0
Test Accuracy: 0.6363636363636364
Hello, thanks for contacting.
In this particular work, we adopted the same post-processing steps as Wang et al. (2016) to obtain the slide-level predictions. Here we used a SVM classifier instead of a Random Forest that is trained using geometrical features reported in Wang et al. 2016.
In your experiments, how are you training this RF classifier ? What is your backbone patch-level classifier that is used for tumour probability heat-map generation ?
On Sat, Mar 12, 2022 at 9:40 AM szc19990412 @.***> wrote:
Hello, thanks for your amazing work!In the process of reproducing the results of the paper, I encountered some problems, which I hope can be replyed. For the slide prediction in Camelyon16, I didn't find code on how to predict from heatmap to slide level. According to the paper, I refer to the code here: https://github.com/3dimaging/DeepLearningCamelyon/tree/master/4%20-%20Prediction%20and%20Evaluation/Evaluation For a Slide, I extracted 28 features based on the heatmap, and then fed into the random forest for training, but did not get a good result. So there will be some tricks to train the RandomForestClassifier?If you can open source the code for this part, I believe it will be of great help! Looking forward to your reply!
— Reply to this email directly, view it on GitHub https://github.com/srinidhiPY/SSL_CR_Histo/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4AXU4N4K2LTGQZTUPKS43U7SUELANCNFSM5QR7AWOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi, thanks for your reply! For the patch-level classifier, I used a vit-tiny model based on the ImageNet pretrained weight. I used the SVM classifier, but again I didn't get good results. Maybe it's because I'm not taking self-supervised weights? Here is the SVM training code:
import numpy as np
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import pandas as pd
data = pd.read_csv('/data111/shaozc/Camelyon16/data_sheet_for_random_forest.csv', index_col=0)
#---->split train/test
data.index = data['name']
data['name'] = data['name'].map(lambda x: x.split('_')[0])
mask = data['name']=='test'
df_test = data.loc[mask]
df_test.reset_index(inplace=True, drop=True)
df_train = data.loc[~mask]
df_train.reset_index(inplace=True, drop=True)
X_train = df_train.iloc[:,2:].values
y_train = df_train.iloc[:,1].values
X_test = df_test.iloc[:,2:].values
y_test = df_test.iloc[:,1].values
clf = make_pipeline(StandardScaler(), SVC(gamma='auto', probability=True, random_state=42))
clf.fit(X_train, y_train)
from sklearn import metrics
print("Train Accuracy:",metrics.accuracy_score(y_train, clf.predict(X_train)))
print("Test Accuracy:",metrics.accuracy_score(y_test, clf.predict(X_test)))
print("Train Auc:",metrics.roc_auc_score(y_train, clf.predict_proba(X_train)[:, 1]))
print("Test Auc:",metrics.roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1]))
Here is the csv file : data_sheet_for_random_forest.csv
By the way, for the third step of semi-supervised training, I think there is a problem here.
Because train_tumor_idx
include all training indices, unlabeled_train_idx
should be separated from tumor_labeled_train_idx
.
tumor_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_tumor_idx)-set(tumor_labeled_train_idx)))
normal_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_normal_idx)-set(normal_labeled_train_idx)))
Hi there, this has already been clarified in our paper in Section 3.3, which I quote here
On Sun, Mar 13, 2022 at 7:59 AM szc19990412 @.***> wrote:
By the way, for the third step of semi-supervised training, I think there is a problem here. [image: image] https://user-images.githubusercontent.com/58326452/158058190-e8bd57a8-fe81-4786-8989-6babb2470eb3.png
Because train_tumor_idx include all training indices, unlabeled_train_idx should be separated from tumor_labeled_train_idx.
tumor_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_tumor_idx)-set(tumor_labeled_train_idx))) normal_unlabeled_train_sampler = SubsetRandomSampler(list(set(train_normal_idx)-set(normal_labeled_train_idx)))
— Reply to this email directly, view it on GitHub https://github.com/srinidhiPY/SSL_CR_Histo/issues/4#issuecomment-1066086098, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4AXUYOEGBWQNJRCPMPIPDU7XKDHANCNFSM5QR7AWOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: @.***>
Hi, thanks for your reply! For the patch-level classifier, I used a vit-tiny model based on the ImageNet pretrained weight. I used the SVM classifier, but again I didn't get good results. Maybe it's because I'm not taking self-supervised weights? Here is the SVM training code:
import numpy as np from sklearn.svm import SVC from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline import pandas as pd data = pd.read_csv('/data111/shaozc/Camelyon16/data_sheet_for_random_forest.csv', index_col=0) #---->split train/test data.index = data['name'] data['name'] = data['name'].map(lambda x: x.split('_')[0]) mask = data['name']=='test' df_test = data.loc[mask] df_test.reset_index(inplace=True, drop=True) df_train = data.loc[~mask] df_train.reset_index(inplace=True, drop=True) X_train = df_train.iloc[:,2:].values y_train = df_train.iloc[:,1].values X_test = df_test.iloc[:,2:].values y_test = df_test.iloc[:,1].values clf = make_pipeline(StandardScaler(), SVC(gamma='auto', probability=True, random_state=42)) clf.fit(X_train, y_train) from sklearn import metrics print("Train Accuracy:",metrics.accuracy_score(y_train, clf.predict(X_train))) print("Test Accuracy:",metrics.accuracy_score(y_test, clf.predict(X_test))) print("Train Auc:",metrics.roc_auc_score(y_train, clf.predict_proba(X_train)[:, 1])) print("Test Auc:",metrics.roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1]))
Here is the csv file : data_sheet_for_random_forest.csv
Have you solved this issue?
Hello, thanks for your amazing work!In the process of reproducing the results of the paper, I encountered some problems, which I hope can be replyed. For the slide prediction in Camelyon16, I didn't find code on how to predict from heatmap to slide level. According to the paper, I refer to the code here: For a Slide, I extracted 28 features based on the heatmap, and then fed into the random forest for training, but did not get a good result. So there will be some tricks to train the RandomForestClassifier?If you can open source the code for this part, I believe it will be of great help! Looking forward to your reply!