Closed lambdamoses closed 9 months ago
once PR #12 for readVizgen()
is fine, I can make readXenium()
in a similar way, this time base R version. Unless you are already working on it?
cheers
I haven't started working on it yet. Still working on the paper. If you would like to work of readXenium()
, then that would be great.
ok, great! I can get my hand on that, we have few Xenium datasets to test that function as well.
I'm playing around with some Xenium data from the 10X website. Looks like the z coordinates of the transcript spots are continuous so I can't simply select a z-plane. Now I'm debating what to do with the z coordinates.
BTW, have you started working on readCosMX
? If not then I'll start writing it. Some wet lab colleagues said that Xenium and CosMX data quality and their curated gene panels are pretty bad. I wonder what you think of the data quality.
Yeah, Xenium output is bit different. I can try to make it as for readVizgen
(using your last commits).
I'm using "_xenium_V1_FFPE_Human_Brain_Healthy_With_Addonouts" public data from 10X as a test data.
For Seurat, someone from 10X implemented ReadXenium
& LoadXenium
. For transcripts, only 2D xy coords are read in (without z plane selection). The only filtering they do is on the phred-scaled qv
col, default is qv >= 20
, this you already have in formatTxSpots
, support for Xenium format would need to be added by skipping z
arg selection, ie if z coords are non-integers.
Are you still working on addTxSpots
?
Cell segmentations don't have z col, only cell_id
and xy coords.
For tissue image, the DAPI morphology_mip.ome.tif
can be used, (it's maximum intensity projection of 3D z-stack)
Also, I think for count matrices, we would need to use altExp
for each "Feature type" (Gene Expression
, Negative Control Codeword
, Negative Control Probe
,Unassigned Codeword
,Deprecated Codeword
), since each of them have different number of genes/probes. Now the question is, what format altExp
should be? SFE object with "centroids" in colGeometry
and optionally transcripts in rowGeometry
for each altExp
?
This is also relevant for SFE <-> Seurat converter (which I will try to finalize this week)
I haven't touched readCosMX
, go ahead 👍
As for data quality, we recently started providing Xenium service. As far as I know, a number of research groups here already use and prefer that technology, maybe quality depends on the which tissue is used as well. And Xenium will have 5K gene panel soon. We don't use Nanostring tech. at all.
So far I'm not dropping z (df2sf
supports 3D geometries when 3 columns are specified for the coordinates), but the user can easily do so with st_zm
. I don't have to match what Seurat is doing. I feel like the z coordinates could be interesting to plot at some point. I'm still working on addTxSpots
, which is just to add the output of formatTxSpots
to the SFE object. I think feature type can simply be a column in rowData
, so anything that is not gene expression can be used for QC and later dropped after QC. I'm not sure about altExp.
As for data quality, we recently started providing Xenium service.
So are you in a core facility? Also I heard about that 10X vs. Nanostring lawsuit that banned sales of CosMX in Germany.
yeah, I get it.
Support for plotting 3D coords would be interesting, I think using interactive plots like plotly
would be nice. I did work before with surfaces using rgl
& mesh3d
objects of volumetric data (CT data).
For Visium data, I was using one gene expression at a time (as a surface topography, also using gaussian smoothing kernel), where z elevation was the expression magnitude. Some time ago, I tried 3D umap with ~150 single cells using plotly
, was pretty nice, but demanding on computer graphics. So, probably something like that could be done.
ok, once you are ready with addTxSpots
I will try the readVizgen
from your last commit.
rowData
sounds good, I will try to use it for all non gene expression stuff. Thanks!
As for data quality, we recently started providing Xenium service.
So are you in a core facility? Also I heard about that 10X vs. Nanostring lawsuit that banned sales of CosMX in Germany.
yes, we are part of core facility for spatial and single cell techs.
Can you write unit tests and examples for your pull readXenium
and Seurat conversion pull requests? I have too many other things to write for Bioc 3.18 so it would be great if you write these unit tests. That would involve creating small toy datasets for testing and examples in function documentations.
I think I'll write new unit tests for readVizgen
since I wrote a lot of code for formatting the transcript spots.
Can you write unit tests and examples for your pull
readXenium
and Seurat conversion pull requests? I have too many other things to write for Bioc 3.18 so it would be great if you write these unit tests. That would involve creating small toy datasets for testing and examples in function documentations.I think I'll write new unit tests for
readVizgen
since I wrote a lot of code for formatting the transcript spots.
Sounds good. However, I don't have much experience with unit tests, but could try to do that. I always test functionality on my side (on few different datasets) before submitting PRs.
Here's a guide to get started: https://r-pkgs.org/testing-basics.html I think writing unit tests is worth the trouble, since it's easier to run them after writing new code and to check if existing code is broken by the new code or bug fix than to run less formal tests.
as for images from Xenium output, the problem is the ome.tif
format for images, eg morphology_mip.ome.tif
. terra::rast
cannot load it Cannot open TIFF file due to missing codec JP2000. (GDAL error 1)”
I'm thinking to have an internal function to load ome.tif
and convert it to SpatRaster
. SpatialOmicsOverlay
might help here.
Any suggestions?
Looks like it's related to the compression scheme. I got an error in tiff::readTIFF
: pkg:tiff: Compression scheme 34712 tile decoding is not implemented
. However, I can open the ome.tiff file with Bio-Formats in FIJI and the python tifffile package. I'm not sure what to do. The whole point of terra is not to load the entire thing into memory, while FIJI and tifffile do load the image into memory.
I tried RBioFormats
, which is used behind the scene in SpatialOmicsOverlay
. It works. In case of limited RAM, then read a lower resolution, which is sufficient for plotting in most cases. Can also use the subset
argument of the read.image
function to only read in a small subset when plotting only a few cells. Then I need to modify the SFE package so SFE class won't always use terra.
Seems that ome.tif
related compression is not yet implemented..
The whole point of terra is not to load the entire thing into memory, while FIJI and tifffile do load the image into memory.
yeah, exactly that. What if to implement an internal converter from ome.tif
to .tif
using RBioFormats
or similar, and export the converted .tif
(low resolution images) to the out dir of Xenium, then loading object and adding images with terra
as usual? Else, using only RBioFormats
for when ome.tif
files are present.
readXenium
is kind of ready (without adding images), I tested locally for some datasets, works well.
I can add it to seurat_v4
branch.
The issue is with adding images, I haven't figured how to efficiently load ome.tif
file, it doesn't work with SpatialOmicsOverlay
since it's tailored for Nanostring data.
It works with RBioFormats
alone but terra::rast
won't load it after converting to .tif
, magick
can load it but that would be another dependency then.
# eg
library(SpatialOmicsOverlay)
library(RBioFormats)
img_fn <- "./morphology_focus.ome.tif"
xml <- xmlExtraction(ometiff = img_fn[1]) # works!
scanMeta <- parseScanMetadata(omexml = xml) # doesn't work
# this works
img <- read.image(file = img_fn[1],
resolution = 6,
#read.metadata = FALSE,
#normalize = FALSE
)
# then export image
write.image(img, file = gsub("ome.", "", img_fn[1]), force = TRUE)
# read with R `magick` works
img_test <- image_read(path = gsub("ome.", "", img_fn[1]))
# then we need to convert to SpatRaster
img_test <- img_test |>
as.raster() |> as.matrix() |> rast()
plot(img_test) # works
Do you have your example of reading/converting Xenium output ome.tif
file like morphology_focus.ome.tif
using RBioFormats
and loading it with terra
?
Any suggestions on how shall we deal with loading Xenium images?
Thanks
making toy dataset for Xenium would be a bit hard, like for zarr.zip
files etc..
I think one can just keep files that are used in readXenium
and ignore the rest?
I'm going to add internal function to load ome.tif
with RBioFormats
and save as .tif
with a selected resolution
in read.image
. It seems to load with terra::rast
afterwards
It's accessible now -> SpatialFeatureExperiment:::readXenium
. The only issue is that when plotting cell polygons and image with Voyager, the image doesn’t align 100% with colGeometry
. And I couldn't find which json
file to use for bbox as for readVizgen
.
For some datasets it overlays well, for other not really, see below
# plot Segs
options(repr.plot.height = 5, repr.plot.width = 10)
pl1 <-
plotSpatialFeature(sfe, features = rownames(sfe)[1],
size = 4,
colGeometryName = "cell",
dark = TRUE,
image_id = imgData(sfe)$image_id[1]
) & Seurat::DarkTheme()
pl1
xenium_V1_FFPE_Human_Brain_Healthy_With_Addon
-> some shift
/xenium_prerelease_sept15_hBreast/hBreast_ffpe_small/
-> overlays well
bbox_use <- c(xmin = 2000, ymin = -2500, xmax = 2500, ymax = -2000)
I tested it on earlier dataset Xenium_FFPE_Human_Breast_Cancer_Rep1
from 10X Xenium paper.
It works, the image overlaps well with cells or nuclei polygons.
bbox_use <- c(xmin = 6400, ymin = -3500, xmax = 7000, ymax = -2500)
I think the mismatch between image and cell/nuclei polygons that occurs for latest datasets, has to do with some shifts, that displace the segmentation masks from the image they came from. I haven't figured when it occurs.
So, using extent <- colGeometry(sfe, 1) |> st_geometry() |> st_bbox()
should be fine, since images to overlap with cell/nuclei polygons at least for 2 public datasets:
Xenium_FFPE_Human_Breast_Cancer_Rep1
Xenium_prerelease_sept15_hBreast/hBreast_ffpe_small/
seems for some new datasets (standard XeniumRanger outs
) the .ome.tif
images are no well aligned with segmentation polygons. XeniumExplorer is used to do image registration (affine) and transformation matrix can be saved (all manual though).
I think image registration needs to be used before adding to imgData
, already existing packages can be used, which would mean adding yet another dependency.
I think a manual step is acceptable; for Visium, there's also the option to manually align the spots and the image and to manually find which spots are in tissue in the Loupe browser. Or maybe use ImageJ for the registration. You don't have to stick to terra
for the image here; it's very hacky to rotate the image with terra
, plus terra
has trouble reading the .ome.tif file anyway. If still using terra
, then I'm thinking about using an engineering CRS and define a math transform in WKT, or do something with GDAL; in geography rotation is more commonly performed in CRS transformations and those CRS's are already pre-defined. Unfortunately R isn't great for image processing. The only R package on CRAN or Bioc that does affine registration I know of is RNiftyReg
. It's fine to add to Suggests
.
making toy dataset for Xenium would be a bit hard, like for
zarr.zip
files etc.. I think one can just keep files that are used inreadXenium
and ignore the rest?
Sounds good. The purpose of the toy dataset is just to test readXenium
so we don't need the other stuff.
making toy dataset for Xenium would be a bit hard, like for
zarr.zip
files etc.. I think one can just keep files that are used inreadXenium
and ignore the rest?Sounds good. The purpose of the toy dataset is just to test
readXenium
so we don't need the other stuff.
yeah, only those relevant to readXenium
👍
I think a manual step is acceptable; for Visium, there's also the option to manually align the spots and the image and to manually find which spots are in tissue in the Loupe browser. Or maybe use ImageJ for the registration. You don't have to stick to
terra
for the image here; it's very hacky to rotate the image withterra
, plusterra
has trouble reading the .ome.tif file anyway. If still usingterra
, then I'm thinking about using an engineering CRS and define a math transform in WKT, or do something with GDAL; in geography rotation is more commonly performed in CRS transformations and those CRS's are already pre-defined. Unfortunately R isn't great for image processing. The only R package on CRAN or Bioc that does affine registration I know of isRNiftyReg
. It's fine to add toSuggests
.
yeah I agree, I think keeping terra
is ok for now.
I might found a way round (using affine transforms in Rvcg
and/or Morpho
) to register image to colGeometry
and then using that transform or ext
for SpatRaster
, need to test, will let you know if it works.
RNiftyReg
is an option.
Fortunately, the colGeometry
and rowGeometry
overlap always. I think image registration in Xenium pipeline doesn't always work well, which creates those shifts for some samples.
About the image alignment issue: Xenium Explorer must have the info to align the cell segmentation and the image. I think I found where that info resides: in the pixel_size
field in the experiment.xenium
file, which is basically json. That's the pixel size in the morphology.ome.tif
image file (in µm) (the highest resolution), which has the same xy dimension as the other two morphology image files. The bounding box of the cell segmentation doesn't match the dimensions of the image because much of the image doesn't have cells. So unless you want to load other images, registration isn't necessary.
I played around with RBioFormats
; we can read a subset of the image. I think I'll write another S4 class, inheriting from VirtualSpatialImage
in SpatialExperiment
for BioFormats (I'll call it BioFormatsImage
) with an optional added slot for the extent so only the subset within the extent from the desired resolution is loaded into memory when needed such as when plotting, especially when a high resolution is used so we don't load the entire high resolution image into memory. Otherwise only the file path and the extent are in memory. This would require more changes in SFE and Voyager to accept images other than SpatRaster, but I think it's worth it so SFE and Voyager can work more smoothly with ome.tiff
and BioFormats in general.
Regarding the image and segmentations mismatch, after back-and-forth with colleagues here, we found out the following:
In micron and in pixel space, segmentations and transcripts coordinates match well with an image at full resolution. OME-TIFF comes with up to 8 image resolutions. When read with RBioFormats::read.image
OME image metadata needs be read separately.
Now, things match well (no shifts) for low resolution images when some cell/nuclei segmentations are present at image borders (towards 0,0 corner).
The bounding box of the cell segmentation doesn't match the dimensions of the image because much of the image doesn't have cells
Exactly that 👍
So, things get shifted for low resolution images only when no segmentations are present at the borders. Here the transform
c(0, full_res_img_width * px_size_micron,
0, full_res_img_heigth * px_size_micron)
is needed to set the extent of the low resolution image to match the polygons geometries in micron space. The relevant information for that transform needs to be extracted from metadata using read.omexml
.
# eg
meta_ome <-
read.omexml(grep("_mip.ome.tif", img_fn, value = TRUE)) |>
strsplit(x = _, split = " ") |> unlist() |>
grep("PhysicalSize|Size[XY]", x = _, value = TRUE)
# here we get px_size_micron and the full_res_img_width -> SizeX, full_res_img_heigth -> SizeY
meta_ome
> 'PhysicalSizeX="0.2125"''PhysicalSizeXUnit="µm"''PhysicalSizeY="0.2125"''PhysicalSizeYUnit="µm"''SizeX="36955"''SizeY="27282"'
I can implement that now within the readXenium
, also after image is read with img <- read.image(...)
it can be directly converted to SpatRaster img@.Data |> t() |> rast()
and add corrected ext
, then (optionally) save as .tif
with terra::writeRaster
. That would be a solution for now, creating new S4 class BioFormatsImage
is a good idea to make things more efficient for ome.tif
files.
As for resolution in read.image
, for larger/full dataset I could only load resolution = 4
or lower, it won't load higher ones.
The morphology.ome.tif
is a 3D stacked image, morphology_mip.ome.tif
is MIP of that 3D image and has the corresponding full resolution as well, so we don't need to load the large 3D image unless really needed.
For the highest resolution, this works:
img <- read.image("morphology_mip.ome.tif", normalize = FALSE, resolution = 1,
subset = list(x = 12000:13000, y = 12000:13000))
I think the high resolution is important when plotting a small number of cells, very zoomed in. I used the human skin dataset from the 10X website. The original file is over 300 MB.
Potentially, to add support for px
pixel space, when unit = "pixel"
would be useful too, I think. Could work on that later.
Pretty much done in https://github.com/pachterlab/SpatialFeatureExperiment/commit/5d1282ce1024760720bfc606ab936f752bf3b28d though need more extensive testing
Should be super easy. The harder part is to add Xenium toy dataset for unit testing.