mahmoodlab / HEST

HEST: Bringing Spatial Transcriptomics and Histopathology together
Other
120 stars 11 forks source link

Pre-processed data for the multimodal representation learning task #34

Closed hathawayxxh closed 2 weeks ago

hathawayxxh commented 3 weeks ago

Hi Authors,

Thanks for your excellent work. I am very interested in developing algorithms based on the HEST-1k database. I would like to know how to get access to the pre-processed data for the multimodal representation learning task, which corresponds to the experimental results in Table 2. I look forward to your reply.

Best, Xiaohan

guillaumejaume commented 3 weeks ago

Hi Xiaohan, experiments presented in Table 2 are based on all human Xenium breast samples (see HEST-1k metadata). You can query those samples using our download pipeline (see tutorial 1). We only did log1p normalization. The code for contrastive alignment is not public yet, but it is quite standard.

hathawayxxh commented 3 weeks ago

Hi Guillaume, Thanks for your prompt reply. I have checked the metadata and noticed there are six Xenium invasive breast cancer samples (i.e., TENX 94-99). However, you indicated that you used five samples to finetune the CONCH model. Could you please indicate which five samples you used for finetuning? Thanks.

guillaumejaume commented 3 weeks ago

We used NCBI783 (IDC), NCBI785 (IDC), TENX95 (IDC), TENX99 (IDC) and TENX96 (ILC). Others are duplicates using a different gene panels. You can still use them (3 additional samples) but there is redundancy.

guillaumejaume commented 3 weeks ago

You can refer to the patient entry in the metadata.

hathawayxxh commented 2 weeks ago

Thanks, I will try.