Pre-processed data for the multimodal representation learning task

mahmoodlab / HEST

HEST: Bringing Spatial Transcriptomics and Histopathology together - NeurIPS 2024 (Spotlight)

Other

169 stars 12 forks source link

Pre-processed data for the multimodal representation learning task #34

Closed hathawayxxh closed 3 months ago

hathawayxxh commented 3 months ago

Hi Authors,

Thanks for your excellent work. I am very interested in developing algorithms based on the HEST-1k database. I would like to know how to get access to the pre-processed data for the multimodal representation learning task, which corresponds to the experimental results in Table 2. I look forward to your reply.

Best, Xiaohan

guillaumejaume commented 3 months ago

Hi Xiaohan, experiments presented in Table 2 are based on all human Xenium breast samples (see HEST-1k metadata). You can query those samples using our download pipeline (see tutorial 1). We only did log1p normalization. The code for contrastive alignment is not public yet, but it is quite standard.

hathawayxxh commented 3 months ago

Hi Guillaume, Thanks for your prompt reply. I have checked the metadata and noticed there are six Xenium invasive breast cancer samples (i.e., TENX 94-99). However, you indicated that you used five samples to finetune the CONCH model. Could you please indicate which five samples you used for finetuning? Thanks.

guillaumejaume commented 3 months ago

We used NCBI783 (IDC), NCBI785 (IDC), TENX95 (IDC), TENX99 (IDC) and TENX96 (ILC). Others are duplicates using a different gene panels. You can still use them (3 additional samples) but there is redundancy.

guillaumejaume commented 3 months ago

You can refer to the patient entry in the metadata.

hathawayxxh commented 3 months ago

Thanks, I will try.