Closed alikhuseynov closed 1 year ago
Thanks for checking out this package. Since I already have the script to read Vizgen data, I do have a plan to turn that into a function, hopefully for the next Bioconductor release, version 3.17. Or would you like to pull request? I agree that it would be nice to have a function to convert from Seurat object. Since we have a lot to do for version 3.17, the Seurat part might come in 3.18. Those functions will go into the SpatialFeatureExperiment package instead of Voyager since they have more to do with the SFE object than the ESDA.
ok thanks, I will see what I can do this month for PR regarding read/load Vizgen data for SpatialFeatureExperiment
repo.
A wrapper to convert from Seurat
object to SFE
would be super handy for users (will work on that too).
best
Thank you very much for the PR! Since that's for the SFE repo, I'll close this issue.
Are you still going to do the PR in the SFE repo? The next Bioconductor release is coming up. If not, then I'll add the Vizgen function myself.
sorry, I won't manage to do this on time. If you like, take a look at my modified version of loading Vizgen data using Seurat https://github.com/alikhuseynov/add-on_R/blob/develop/R/read_load_vizgen_v2.R Thanks!
None of the example datasets on Vizgen's website has the cellpose parquet files. Can you share an example of the parquet file? This is for testing and examples.
yes, but it's large ~600MB, do you have a place where I can drop it?
Can you upload it to this Box folder? https://caltech.app.box.com/f/3cc55ae6f7ad472abdd27acd660d8b90 I just need to take a look at the format so I can create a small toy example from a different dataset. Thanks!
ok, done! I found a smaller one ~37MB..
To load and view the whole file using sfarrow::st_read_parquet()
I'm preparing to submit a PR to Seurat for handling parquet
file
best
Thank you!
My own function is getting very different from the one in Seurat. I'm looking at your code, and found this when one cell has multiple polygons:
if (which(unlist(test.segs) > 1) %>% any) {
segs.art.index <- which(unlist(test.segs) > 1)
message(segs.art.index %>% length,
" Cells have > 1 segmentaion boundary artifacts", "\n",
"..removing artifacts", "\n",
"..keeping cell boundaries with maximum coords")
# usually artifacts have small boundaries/coords
# find cell boundaries with maximum coords
for (i in segs.art.index %>% seq) {
dims <- lapply(segs[[segs.art.index[i]]][[1]] %>% seq(),
function(d) { dim(segs[[segs.art.index[i]]][[1]][[d]])[1] } )
# get & keep boundaries with maximum coords
maxs.segs <- which(unlist(dims) == max(unlist(dims)))
segs[[segs.art.index[i]]][[1]] <- segs[[segs.art.index[i]]][[1]][maxs.segs]
}
I'm not sure if I entirely agree with this approach, because upon inspection, in some cases Cellpose somehow assigned two cells to one ID but both pieces are small so can both be artifact, like this:
The smaller piece doesn't quite look like artifact. But in some cases, the small bit does look like artifact:
I think I can add another argument for minimum cell size and issue a warning if some cells have multiple pieces that are all larger than the minimum size. I also check if the cell polygons are valid geometries and issue a warning if anything is invalid.
yeah :) I know that problem.. Sometimes Cellpose output has multiple polygons for same cell (it's not always the case, depends on the sample and how well the segmentation worked). In this case I thought to just get the largest polygon as single cell boundary. I think Seurat object won't allow multiple polygons for same cell ID. Your suggestion of a new argument for minimum cell size would be a good idea, plus warnings etc.. Thanks!
btw, What's is different in the output from your own function as compared to mine/Seurat?
Mine returns a SpatialFeatureExperiment
object, and I read the image with the terra
package so the image is not loaded into memory unless necessary, and when plotting a larger image, it is downsampled to make plotting faster. Haven't pushed it yet.
MULTIPOLYGONs are technically allowed but it doesn't make biological sense for whole cell segmentation. Technically allowed because there can be other cell level geometries that have multiple pieces, like endoplasmic reticulum segmentation.
Also I'm not reading in the transcript spots yet. For MERFISH datasets the csv file can be over 30 GB in size and I don't know how to deal with the size and I'm not sure how useful it is to load it. I can do spatial point process analyses on the transcript spots, but I don't know what to do with the large number of cells both computationally and statistically. I need to look into sedona
for on disk representation of Simple Features before dealing with the transcript spots.
you mean detected_transcripts.csv
from Vizgen? In our case single file was like 700MB.
We are trying out Segmentation-free approach (don't have a push yet), where only molecule coordinates are used (no image or segmentations, only x, y, gene
coordinates and molecule name)
The idea is to use some form of quantization, ie Spatial binning (borrowed from geospatial stats) on molecule coordinates, given a desired bin_size
.
This makes it:
See the attached example
OK, so there are cases where that file isn't too large and more manageable in memory. I was a bit intimidated by the example datasets on Vizgen's website. But still, I think I should consider sedona because it's only helpful to plot the transcript spots when zoomed into small areas so it would be nice to not load the whole thing into memory, because the analyses are typically performed at the cell or in your case bin level where the location of transcript spots within cells is irrelevant. Besides spatial point process analysis, plotting is the only case I can think of where the transcript spot geometry is relevant. I don't think I'm able to do the sedona thing for Bioconductor 3.17. Maybe I can do it for 3.18 or 3.19.
Ok, thanks. I agree with you, many things are mostly for visualization.
In 3.17, will you have a function to convert SpatialFeatureExperiment
object to Seurat and back?
Unfortunately, I don't think I have enough time to write the Seurat conversion function for 3.17. I still have to write new vignettes for the new functionalities before April 26.
Hi there,
Thanks for implementing this nice package, bridging geospatial data analysis with spatial omics analysis! :) Few questions, however these things are probably on your todo list already:
SFE
object? Similar to SeuratSeurat::ReadVizgen()
&Seurat::LoadVizgen()
I could only find create-sfe-object-from-scratchFOV
) toSFE
As for Vizgen/Merscope, recent version provides only Cellpose output, eg :
Something that I added using
sfarrow
for Seurat obj. for data loading for segsThanks A.