256 x 256 patches for each spot instead of 224 x 224?

kennethahah commented 2 months ago

Thanks for getting and aligning all data. I have two questions for the patches in each dataset.

On the HEST-1k paper, it says that patches are in the size of 224 x 224. However, when I downloaded the dataset TENX99, patches in the folder patches/ are of size 256 x 256. Any particular reason for having a slightly larger patch?
In the file TENX99.h5, it not only contains the images but also coordinates of spots. Is the image in the resolution 0.5um/px and the coordinates are in pixels? If so, then the WSI seems too large to be true because it has x coordinates greater than 50,000 and y coordinates greater than 100,000, which translates to a slide of size 25mm x 50 mm. It's not possible to fit it into a Visium machine. If not, then I guess the coordinates are in pixels but for a higher resolution WSI.

pauldoucet commented 2 months ago

Hi @kennethahah,

1 - We apologize for the inconsistency, originally we extracted patches of 224x224 at 0.5um/px and rescaled those 224x224 patches to 256x256 in order to mimic the way foundation models like CTranspath were trained. However following your question, we re-uploaded the 224x224 patches on huggingface (see 2).

2 - The spot coordinates (in pixel on the high resolution image) are stored in the st.adata.obsm[‘spatial’] of each sample as described here. If interested in the coordinates of each extracted patch, please redownload the latest patches directory from huggingface, we added the following .h5 assets in the latest version:

coords: indicates the (x, y) pixel coordinates of the top left corner for each patch (in the full resolution image, note that TENX99.tif is 0.2125um/px not 0.5um/px)
patch_size_src: indicates the width/height of each patch before rescaling (hence at 0.2125um/px for TENX99)
patch_size_target: indicates the width/height of each patch after rescaling (0.5um/px in our case)

Also please feel free to extract the patches at your preferred patch_size/resolution with dump_patches (see the documentation here)

sts = load_hest('hest_data', id_list=['TENX95', 'TENX99'])
for st in sts:
    st.dump_patches('patch_dir', target_patch_size=224, target_pixel_size=0.5)

Let us know if anything is missing

kennethahah commented 2 months ago

Hi @pauldoucet

Thanks for uploading the 224*224 patches.

In each of the .h5 files in the directory patches, I can see three keys coords, barcode, and img. However, I don't see the two other keys patch_size_src and patch_size_target.

pauldoucet commented 2 months ago

The other keys are in the global attributes of the .h5, because they are common to all the patches:

f['img'].attrs.keys()
<KeysViewHDF5 ['downsample', 'patch_size_src', 'patch_size_target', 'pixel_size']>

kennethahah commented 2 months ago

Thanks @pauldoucet. These are all my questions. I'll close this issue.

mahmoodlab / HEST

256 x 256 patches for each spot instead of 224 x 224? #36