use of 'iter_comb' and 'margin' parameters in train_STAligner

zhoux85 / STAligner

STAligner is a tool for alignment and integration of spatially resolved transcriptomics data.

MIT License

29 stars 5 forks source link

use of 'iter_comb' and 'margin' parameters in train_STAligner #24

Open owenwilkins opened 2 months ago

owenwilkins commented 2 months ago

i was hoping someone may be able to provide some more information on how to use these parameters, and what are reasonable values (mainly for margin) in different contexts. for example, several of the tutorials provide different values for margin however I haven't been able to find a discussion of why each value was selected.

regarding iter_comb, some more details about how this works in general would be very useful, as it was not clear to me how this would be modified for larger datasets, for example, do all pairwise combinations between samples need to be specified, or just each sample against the reference? if selection of the reference sample is important, it would also be good to have some details about how to do that.

thanks in advance

zhoux85 commented 1 month ago

Hi. Thanks for your comments.

margin is used to control the intensity/weight of batch correction. Large values are suggested when large batch difference exist.

iter_comb is used to specify the slice order of integration. By default, iter_comb do all pairwise combinations, which is computationally intensive for large datasets.

To reduce computational cost, we suggest setting alignment in sequence for adjacent slices like Tutorial 5. For example, iter_comb = [(0, 1), (1, 2), (2, 3), (3, 4) .......]. For slices from different platforms or embryo slices from different time stages, like Tutorial 4, we suggest selecting one slice as reference to align.

owenwilkins commented 1 month ago

Great thanks for the follow up. Do you have any suggestions for sensible ranges of values for the ‘margin’ argument, and how to select them?

owenwilkins commented 1 month ago

it would also be helpful if you could clarify how to select values for 'rad_cutoff' and what exactly this value pertains to. I am using xenium data, which has a smaller resolution than the other technologies used in the vignettes, so I assume a smaller value than was used for those technologies. I have tried 1.3 as used in tutorial 4, however this doesn't seem to result in many neighbors being identified.

zhoux85 commented 1 month ago

Usually, margin is in the range of [1, 10].

There is no relation between the spatial resolution and the value of rad_cutoff. It depends on the range of spatial coordinates X and Y. We need to try it with different values to obtain ~10 neighbors. We find that rad_cutoff=20 is enough for human breast Xenium data https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast

owenwilkins commented 1 month ago

great thanks. would it be possible to clarify what the rad_cutoff is exactly?