mahmoodlab / HEST

HEST: Bringing Spatial Transcriptomics and Histopathology together - NeurIPS 2024
Other
164 stars 12 forks source link
computational-pathology histology spatial-transcriptomics

HEST-Library: Bringing Spatial Transcriptomics and Histopathology together

Designed for querying and assembling HEST-1k dataset

[ arXiv | Data | Documentation | Tutorials | Cite ]

Welcome to the official GitHub repository of the HEST-Library introduced in "HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis", NeurIPS Spotlight, 2024. This project was developed by the Mahmood Lab at Harvard Medical School and Brigham and Women's Hospital.


What does this repository provide?

HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.


Updates

Download/Query HEST-1k (>1TB)

To download/query HEST-1k, follow the tutorial 1-Downloading-HEST-1k.ipynb or follow instructions on Hugging Face.

NOTE: The entire dataset weighs more than 1TB but you can easily download a subset by querying per id, organ, species...

HEST-Library installation

git clone https://github.com/mahmoodlab/HEST.git
cd HEST
conda create -n "hest" python=3.9
conda activate hest
pip install -e .

Additional dependencies (for WSI manipulation):

sudo apt install libvips libvips-dev openslide-tools

Additional dependencies (GPU acceleration):

If a GPU is available on your machine, we recommend installing cucim on your conda environment. (hest was tested with cucim-cu12==24.4.0 and CUDA 12.1)

pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu12==24.6.* dask-cudf-cu12==24.6.* cucim-cu12==24.6.* \
    raft-dask-cu12==24.6.*

NOTE: HEST-Library was only tested on Linux/macOS machines, please report any bugs in the GitHub issues.

Inspect HEST-1k with HEST-Library

You can then simply view the dataset as,

from hest import iter_hest

for st in iter_hest('../hest_data', id_list=['TENX95']):
    print(st)

HEST-Library API

The HEST-Library allows assembling new samples using HEST format and interacting with HEST-1k. We provide two tutorials:

In addition, we provide complete documentation.

HEST-Benchmark

The HEST-Benchmark was designed to assess 11 foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes nine tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in nine different organs and eight cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in 4-Running-HEST-Benchmark.ipynb.

HEST-Benchmark results (08.30.24)

HEST-Benchmark was used to assess 11 publicly available models. Reported results are based on a Ridge Regression with PCA (256 factors). Ridge regression unfairly penalizes models with larger embedding dimensions. To ensure fair and objective comparison between models, we opted for PCA-reduction. Model performance measured with Pearson correlation. Best is bold, second best is underlined. Additional results based on Random Forest and XGBoost regression are provided in the paper.

Model IDC PRAD PAAD SKCM COAD READ ccRCC LUAD LYMPH IDC Average
Resnet50 0.4741 0.3075 0.3889 0.4822 0.2528 0.0812 0.2231 0.4917 0.2322 0.326
CTransPath 0.511 0.3427 0.4378 0.5106 0.2285 0.11 0.2279 0.4985 0.2353 0.3447
Phikon 0.5327 0.342 0.4432 0.5355 0.2585 0.1517 0.2423 0.5468 0.2373 0.3656
CONCH 0.5363 0.3548 0.4475 0.5791 0.2533 0.1674 0.2179 0.5312 0.2507 0.3709
Remedis 0.529 0.3471 0.4644 0.5818 0.2856 0.1145 0.2647 0.5336 0.2473 0.3742
Gigapath 0.5508 0.3708 0.4768 0.5538 0.301 0.186 0.2391 0.5399 0.2493 0.3853
UNI 0.5702 0.314 0.4764 0.6254 0.263 0.1762 0.2427 0.5511 0.2565 0.3862
Virchow 0.5702 0.3309 0.4875 0.6088 0.311 0.2019 0.2637 0.5459 0.2594 0.3977
Virchow2 0.5922 0.3465 0.4661 0.6174 0.2578 0.2084 0.2788 0.5605 0.2582 0.3984
UNIv1.5 0.5989 0.3645 0.4902 0.6401 0.2925 0.2240 0.2522 0.5586 0.2597 0.4090
Hoptimus0 0.5982 0.385 0.4932 0.6432 0.2991 0.2292 0.2654 0.5582 0.2595 0.4146

Benchmarking your own model

Our tutorial in 4-Running-HEST-Benchmark.ipynb will guide users interested in benchmarking their own model on HEST-Benchmark.

Note: Spontaneous contributions are encouraged if researchers from the community want to include new models. To do so, simply create a Pull Request.

Issues

Citation

If you find our work useful in your research, please consider citing:

Jaume, G., Doucet, P., Song, A. H., Lu, M. Y., Almagro-Perez, C., Wagner, S. J., Vaidya, A. J., Chen, R. J., Williamson, D. F. K., Kim, A., & Mahmood, F. HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis. Advances in Neural Information Processing Systems, December 2024.

@inproceedings{jaume2024hest,
    author = {Guillaume Jaume and Paul Doucet and Andrew H. Song and Ming Y. Lu and Cristina Almagro-Perez and Sophia J. Wagner and Anurag J. Vaidya and Richard J. Chen and Drew F. K. Williamson and Ahrong Kim and Faisal Mahmood},
    title = {HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis},
    booktitle = {Advances in Neural Information Processing Systems},
    year = {2024},
    month = dec,
}