mahmoodlab / HEST

HEST: Bringing Spatial Transcriptomics and Histopathology together - NeurIPS 2024
Other
164 stars 12 forks source link

Reproduce benchmark results #50

Closed judyueshen closed 2 months ago

judyueshen commented 2 months ago

Hi, I used the tutorial notebook and was not able to reproduce the benchmark results "HEST-Benchmark results (08.30.24)" posted on the main page. For example, the ridge regression results are consistently lower, as attached. In this case I won't be able to directly use the table you provided to benchmark my own model. I wonder are there specific settings you used to generate the results, such as seed, etc? Thank you! image

guillaumejaume commented 2 months ago

did you do PCA+Ridge? If you not, you can set it by uncommenting it here: https://github.com/mahmoodlab/HEST/blob/main/bench_config/bench_config.yaml

guillaumejaume commented 2 months ago

Ridge alone is doing pretty badly, probably because of the curse of dimensionality. As a consequence, models with different embedding sizes are not one on one comparable.

If you don't like PCA+Ridge, we also implemented random-forest and xgboost regression.

judyueshen commented 2 months ago

Ah, using PCA fixed it, I did not notice it was commented out! Thank you! Curse of dimensionality makes sense, do you happen to have the number of embedding dimensions of all the models in the benchmark results?

guillaumejaume commented 2 months ago

Not at part of the results. On top of my head:

judyueshen commented 2 months ago

Awesome! Thx!!