mahmoodlab / HEST

HEST: Bringing Spatial Transcriptomics and Histopathology together - NeurIPS 2024
Other
160 stars 12 forks source link

About patches in WSI #47

Closed Yoyiming closed 2 months ago

Yoyiming commented 2 months ago

I would like to know what 'in_tissue' in the Visium data, such as 'INT10', represents in adata. If I want to segment the WSI into patches based on adata's 'pxl_row_in_fullres' and 'pxl_col_in_fullres', should I select the positions where 'in_tissue=0' before performing the segmentation?

pauldoucet commented 2 months ago

Hi, in_tissue is a default Spaceranger flag, it doesn't represent anything at the moment as it was just left out during data acquisition.

The actual tissue segmentation is available in tissue_seg/ on HuggingFace. You can automatically load the tissue segmentation when using load_hest, and then patch such that you only keep the patches under tissue as follows:

from hest import load_hest

hest_list = load_hest('hest_data', id_list=['INT10'])

patch_save_dir = 'patches'

st = hest_list[0]

st.dump_patches(
    patch_save_dir,
    name='demo',
    target_patch_size=224, # target patch size in 224
    target_pixel_size=0.5 # pixel size of the patches in um/px after rescaling
)
Yoyiming commented 2 months ago

Thank you very much for your response! It is really helpful.

Yoyiming commented 2 months ago

@pauldoucet Hello, I have one more question that I am a bit confused about. In the adata, does 'pxl_row_in_fullres' and 'pxl_col_in_fullres' correspond to the width and height in the WSI, respectively? Is the following code correct for obtaining patches?

whole_image = cv2.imread(image_path) patch = self.whole_image[x-112:x+112, y-112:y+112]

Where x represents 'pxl_row_in_fullres' and y represents 'pxl_col_in_fullres'.

Yoyiming commented 2 months ago

Hello, I have one more question that I am a bit confused about. In the adata, does 'pxl_row_in_fullres' and 'pxl_col_in_fullres' correspond to the width and height in the WSI, respectively? Is the following code correct for obtaining patches?

whole_image = cv2.imread(image_path) patch = self.whole_image[x-112:x+112, y-112:y+112]

Where x represents 'pxl_row_in_fullres' and y represents 'pxl_col_in_fullres'.

On Mon, Sep 9, 2024 at 5:20 PM Paul Doucet @.***> wrote:

Hi, in_tissue is a default Spaceranger flag, it doesn't represent anything at the moment as it was just left out during data acquisition.

The actual tissue segmentation is available in tissue_seg/ on [HuggingFace].( https://huggingface.co/datasets/MahmoodLab/hest/blob/main/tissue_seg/INT10_contours.geojson ) You can automatically load the tissue segmentation when using load_hest, and then patch such that you only keep the patches under tissue as follows:

from hest import load_hest

hest_list = load_hest('hest_data', id_list=['INT10'])

patch_save_dir = 'patches'

st = hest_list[0]

st.dump_patches( patch_save_dir, name='demo', target_patch_size=224, # target patch size in 224 target_pixel_size=0.5 # pixel size of the patches in um/px after rescaling )

— Reply to this email directly, view it on GitHub https://github.com/mahmoodlab/HEST/issues/47#issuecomment-2337585322, or unsubscribe https://github.com/notifications/unsubscribe-auth/BD6QADQ4TDZIJ2COPPSXRADZVVR4ZAVCNFSM6AAAAABN4CFFS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZXGU4DKMZSGI . You are receiving this because you authored the thread.Message ID: @.***>

pauldoucet commented 2 months ago

Hello, We strongly encourage using st.adata.obsm['spatial'], instead of 'pxl_row_in_fullres' and 'pxl_col_in_fullres' which might not be available for every technology.

You can then use the following:

xy = st.adata.obsm['spatial'] # Nx2 array, first column is x, second is y
coords = xy[0] # get coordinates of first spot
x, y = coords[0], coords[1]
patch = self.wsi.read_region((x-112, y-112), 0, (224, 224))
Yoyiming commented 2 months ago

Thank you for your suggestion! I’ll give it a try.

pauldoucet commented 2 months ago

I'd also suggest having a look at 2-Interacting-with-HEST-1k.ipynb

Yoyiming commented 2 months ago

Thank you! I'll have a look.

HelloWorldLTY commented 1 month ago

Hi, I found that sometimes .obsm['spatial'] contains float data, and thus it will cause an error for the patching code. Do you think it is ok to use int(x) and int(y) to round the location? Thanks.

Furthermore, may I know the reason of having such shape of the selected patch?

(224, 224, 4)

If we intend to use rgb based pathology foundation model to read it, shall we only consider the first three channel, as I find that the last channel is always 255?