Big gap in experimental results

tikboaHIT commented 3 years ago

hello:

When using your pre-trained model, I found that your accuracy and F1 score cannot be achieved. The results of the model using hact are as follows:

The results of the model using cggnn are as follows:

The results of these experiments are quite different from yours. I'm confused and don't know which phase went wrong.

guillaumejaume commented 3 years ago

Hi,

what version of the BRACS dataset of are using?

guillaumejaume commented 3 years ago

I assume 566 samples is on the new version, which we haven't used for this work. Are you able to reproduce w/ the previous version?

llvy21 commented 3 years ago

I run the pre-trained model on the previous_version data with bracs_hact_7_classes_pna.yml, but got similar results. The results of the model using hact are as follows: 5024FF15-CCA2-4853-BCF4-8000222A7A20 I don't know which phase went wrong.

guillaumejaume commented 3 years ago

Maybe something to do with the preprocessing. You can download the preprocessed cell, tissue and hact graphs for the BRACS dataset here:

https://ibm.box.com/s/6v4sasavltjzi91gmohswz2ek6i8i3lp

Or by downloading this zip file that includes the test cell graphs:

https://ibm.box.com/shared/static/412lfz992djt8u6bgu13y9cj9qsurwui.zip

Let me know if you can reproduce with these ones

llvy21 commented 3 years ago

I downloaded the preprocess cell, tissue and hact graphs, and it works. Thank you. I will check preprocess files later. The results of the model using hact are as follows:

By the way, I see you divided BRACS dataset into four test folds. How did you divide them and why? Are they partitioned randomly?

llvy21 commented 3 years ago

I can't get correct results with the cell graphs files in the second link.

393EA090-CB92-4519-8E76-2BCD37C6E225

.../python3.7/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use zero_division parameter to control this behavior.

But I can get the correct cggnn results using pre-processed files you provided in the first link: 3C7287F3-76F6-4839-8ECE-8A460E9B5DEC

llvy21 commented 3 years ago

When pre-processing images, some will cause a warning like this while the others won't: B9479E5E-02DE-4A9C-B7D6-6A0C0C301EB7 I checked the original images and they're not empty pictures like #2 mentioned. Will it disturb the pre-process? Thank you for your help! :)

guillaumejaume commented 3 years ago

The warnings should not be an issue to run the preprocessing.

huihhui commented 1 year ago

Hello, may I ask where can I download pre-trained models?

CarlinLiao commented 4 months ago

@guillaumejaume Would you know why a model trained on graphs preprocessed and created locally don't provide the same performance as the graphs you provide in the download link?

CarlinLiao commented 4 months ago

I trained my own HACTNet, cell, and tissue graph models to see if it's cell or tissue graphs that's causing the performance gap. Based on my results, it appears that it's mainly the tissue graphs generated by this repo under stock settings that aren't as good for model performance as those uploaded by the histocartography team. Maybe there's something about ColorMergedSuperpixelExtractor or RAGGraphBuilder that different between the paper code and what I ran? I used histocartography version 0.2.0 for this.

Here are my weighted F1 scores on the test set for models trained on the IBM Box graph sets compared to those I created locally from the BRACS ROI previous version using generate_hact_graphs.py and the pretrained checkpoint scores provided in the README:

Model	README	Trained on uploaded	Trained on locally generated
CG Model	56.7	57.5	56.7
TG Model	57.8	57.1	53.8
HACTNet	61.5	61.4	56.5

EDIT: Noticed that my script failed to create the last two dozen training graphs because of a corrupted RoI download. I finished creating my graphs, retrained the models, and updated my findings.

JingnaQiu commented 1 month ago

I trained my own HACTNet, cell, and tissue graph models to see if it's cell or tissue graphs that's causing the performance gap. Based on my results, it appears that it's mainly the tissue graphs generated by this repo under stock settings that aren't as good for model performance as those uploaded by the histocartography team. Maybe there's something about ColorMergedSuperpixelExtractor or RAGGraphBuilder that different between the paper code and what I ran? I used histocartography version 0.2.0 for this.

Here are my weighted F1 scores on the test set for models trained on the IBM Box graph sets compared to those I created locally from the BRACS ROI previous version using generate_hact_graphs.py and the pretrained checkpoint scores provided in the README:

Model README Trained on uploaded Trained on locally generated CG Model 56.7 57.5 56.7 TG Model 57.8 57.1 53.8 HACTNet 61.5 61.4 56.5 EDIT: Noticed that my script failed to create the last two dozen training graphs because of a corrupted RoI download. I finished creating my graphs, retrained the models, and updated my findings.

Hi, I'm not able to reproduce the "Trained on uploaded" results, my results remain around 55, did you do hyperparameter tuning or simply follow the settings (learning rate, epochs, batch size) that are provided in README?

CarlinLiao commented 1 month ago

I didn't do any hyperparamter tuning, I just use the config and settings as shown. I've noticed that test set accuracy tends to fluctuate up and down a few percentage points even with the same settings, so I think 55 is close enough.

histocartography / hact-net

Big gap in experimental results #1