How large is the spot in Xenium data and ST data?

kennethahah commented 3 months ago

Sorry to ask again on the .h5ad file in the folder st.

The indices of the AnnData object, I assume, are barcodes. The AnnData object records the gene counts of each gene in a spot with the corresponding barcode.

For Visium data, I can understand that the spot is a 55um radius circle.

How to understand Xenium and ST data? The real Xenium data has no spot. Do you just bin the gene counts in a spot? If so, what's the radius of the spot you use? The metadata in "hf://datasets/MahmoodLab/hest/HEST_v1_0_2.csv" says the spot diameter is NaN.

For the ST data, the real spot has radius 100um. But since you have 224 x 224 patches at resolution of 0.5um/px (so 112um x 112um), can I understand that the expression data in, for example SPA0.h5ad, covers a spot larger than the 224 x 224 patches in the patch file SPA0.h5?

pauldoucet commented 3 months ago

Hi @kennethahah,

Correct indices of the AnnData are barcodes for Visium and ST. However, in Xenium and Visium-HD the indices of the AnnData are just arbitrary strings assigned to each pseudo-spot.
In Xenium, we tessellated the ST in a grid of 100um x 100um spots called pseudo-spots (100um because real Visium spots have an inter-distance of 100um). Transcripts within a same 100um x 100um region are then summed. You can find pseudo-spots centers positions in the st.adata.obsm['spatial'] (first col is x, second col is y in pixel on the highest resolution image). So to summarize, compared to a normal visium spot, a pseudo-xenium spot will be square without any gaps.
For the ST data, spots might indeed cover a larger region. For SPA0.h5ad has a spot size of 100um which will be slightly smaller than the size of a patch (112um x 112um).

For ST samples feel free to adjust the size of patches by generating your own patches:

sts = load_hest('hest_data', id_list=['TENX95', 'TENX99'])
for st in sts:
    st.dump_patches('patch_dir', target_patch_size=224, target_pixel_size=0.5)

Note that we are currently working on providing the position of each transcript for Xenium (hence allow custom pooling) and an additional pooling of transcripts per cell on HuggingFace.

kennethahah commented 3 months ago

Thanks @pauldoucet.

So in summary, Visium and ST data have the real spot. Xenium and VisiumHD have pseudo spot.

The size of the real spot depends on techonology (55um for Visium and 100um for ST) The size of the pseudo spot is always 100um x 100um.

Is this correct?

guillaumejaume commented 3 months ago

That's correct

kennethahah commented 3 months ago

I'll close the issue. Thanks for clarifying the details.

mahmoodlab / HEST

How large is the spot in Xenium data and ST data? #39