satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.24k stars 902 forks source link

Processing vizgen data #7080

Closed nzhang89 closed 2 months ago

nzhang89 commented 1 year ago

Thank you for your continuous and dedicated efforts of incorporating more and more useful features to Seurat. We recently received our first batch of vizgen data, and I would like to process and visualize them using Seurat.

But the folder structure and files we received are a little different from the public vizgen data. We do not have a cell_boundaries folder with a bunch of h5 files. Instead, we have a single cell_boundaries.parquet file. In this situation, can I still use the cell boundary/segmentation information?

Thank you for your help.

AustinHartman commented 1 year ago

@nzhang89 thanks for posting an issue about this - I heard Vizgen was planning to modify the output segmentation format, but I have not yet seen what these files look like. Any chance you could share some data with me in this format? My email is ahartman@nygenome.org. I can then go ahead and update the LoadVizgen function to support cell boundaries in parquet format.

In the meantime, the steps for creating a Segmentation object, which stores the boundary information in a Seurat object, are relatively straightforward. You'll want to construct a data.frame with x, y, and cell columns. Each row represents a vertice in a segmentation for a given cell, so you'll have many rows per cell. You can then run segs <- CreateSegmentation(df) to generate a Segmentation object. Please let me know if you have any questions!

andrewjkwok commented 1 year ago

Thanks for addressing this - having exactly the same issue here.

Looking at Vizgen's latest user guide: https://vizgen.com/resources/merscope-instrument/?_gl=1labipg_upMQ.._gaMTA1NzM5MzYyNi4xNjgwMjQ5NjM2_ga_4DCLSHRRJC*MTY4MDI0OTYzNi4xLjAuMTY4MDI0OTYzNi4wLjAuMA..

it seems that with MERSCOPE instrument software v232 or later, they've moved to parquet files for the cell boundaries instead of the hdf5 format.

AustinHartman commented 1 year ago

Hey @andrewjkwok, thank you for your patience and pointing me to the user guide. I've reached out to Vizgen for additional information, and I'll update you here once I have a solution

alikhuseynov commented 1 year ago

Vizgen now has Cellpose segmentation output stored as .parquet. I had to modify ReadVizgen() and LoadVizgen(). See this script and example to make Seurat obj of Vizgen data @AustinHartman probably I should submit PR at some point, unless you guys implementing similar stuff already?

Additionally, there were some issue using subset() #6683 and #6767 in order to solve that as intermediate solution, I implemented this script Hope this helps. best

AustinHartman commented 1 year ago

@alikhuseynov thank you so much! A PR to handle parquet files would be much appreciated

alikhuseynov commented 1 year ago

@AustinHartman sure, I like Seurat package and it's always nice to contribute. I will try to work on that.

alikhuseynov commented 1 year ago

PR submitted #7190

andrewjkwok commented 1 year ago

Thnkas @alikhuseynov for the update. I installed the branch that you updated, but unfortunately encounter an error sayign:

Error: file must be a "InputStream"

Do you have any idea what the issue might be? When I look at the traceback, this is what I get:

8: stop(assertError(attr(res, "msg")))
7: assert_that(inherits(object, class), msg = msg)
6: assert_is(file, "InputStream")
5: make_readable_file(file, mmap)
4: arrow::ParquetFileReader$create(dsn, ...)
3: sfarrow::st_read_parquet(file2read)
2: ReadVizgen(data.dir = data.dir, ...)
1: LoadVizgen(data.dir = "/mnt/z/Users/Andrew_K/Projects/spatial_transcriptome/Data/Test_MERFISH_run_16032023/202303171059_20230317_VMSC04301_firstcopy/region_0/")
alikhuseynov commented 1 year ago

How does the content look like of these directories?

Could you post your LoadVizgen() run/script? thanks

andrewjkwok commented 1 year ago

My ./region_0/ directory:

202303171059_20230317_VMSC04301_firstcopy_region_0.vzg  cell_by_gene.csv   cell_metadata_truncated.csv  images
cell_boundaries.parquet                            cell_metadata.csv  detected_transcripts.csv     summary_20230317.png

I'm not sure about the directory containing the Cellpose output - the .parquet file seems to be in the ./region_0/ directory?

My script currently involves nothing other than loading the libraries and running the command!

options(Seurat.object.assay.version = 'v5')

library(Seurat)
library(future)
library(ggplot2)
library(magrittr)
plan("multisession", workers = 10)

vizgen.obj <- LoadVizgen(data.dir = "/mnt/z/Users/Andrew_K/Projects/spatial_transcriptome/Data/Test_MERFISH_run_16032023/202303171059_20230317_VMSC04301_firstcopy/region_0/")
alikhuseynov commented 1 year ago

in our case the Vizgen directory content looks like this:

region_0/
├── xxx_region_0.vzg
├── Cellpose_polyT_CB3
├── detected_transcripts.csv
├── images
└── summary.png

region_0/Cellpose_polyT_CB3/
├── xxx_region_0_cellpose.vzg
├── cellpose_cell_by_gene.csv
├── cellpose_cell_metadata.csv
├── cellpose_micron_space.parquet
└── cellpose_mosaic_space.parquet

I would assume that your cell_boundaries.parquet is the same as cellpose_micron_space.parquet ?

Since I don't have access to the test data, could you run the following and see if the content of your cell_boundaries.parquet is more or less similar?

file2read <- "./region_0/cell_boundaries.parquet"
test_parq <- sfarrow::st_read_parquet(file2read)
# in my case the output looks like that:
test_parq %>% str
Classes ‘sf’ and 'data.frame':  559790 obs. of  10 variables:
 $ ID                : int  0 312 104 156 260 208 52 105 209 53 ...
 $ EntityID          :integer64 112833000000100001 112833000000100001 112833000000100001 112833000000100001 112833000000100001 112833000000100001 112833000000100001 112833000000100002 ... 
 $ ZIndex            : int  3 6 4 1 0 5 2 4 5 2 ...
 $ Type              : chr  "cell" "cell" "cell" "cell" ...
 $ ZLevel            : num  6 10.5 7.5 3 1.5 9 4.5 7.5 9 4.5 ...
 $ Name              : 'vctrs_unspecified' logi  NA NA NA NA NA NA ...
 $ ParentID          : 'vctrs_unspecified' logi  NA NA NA NA NA NA ...
 $ ParentType        : 'vctrs_unspecified' logi  NA NA NA NA NA NA ...
 $ X__index_level_0__: int  0 348 116 174 290 232 58 117 233 59 ...
 $ Geometry          :sfc_MULTIPOLYGON of length 559790; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:12, 1:2] 6805 6807 6808 6809 6809 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "Geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:9] "ID" "EntityID" "ZIndex" "Type" ...
# then filter for mid Z plane
test_parq %>% filter(ZIndex == 3) %>% pull(Geometry) %>% str
sfc_MULTIPOLYGON of length 79970; first list element: List of 1
 $ :List of 1
  ..$ : num [1:12, 1:2] 6805 6807 6808 6809 6809 ...
 - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
andrewjkwok commented 1 year ago

Huh this is strange. Sorry for the poor formatting earlier. This is the directory structure:

region_0/
├── 202303171059_KoLab20230317_VMSC04301_region_0.vzg
├── cell_boundaries.parquet
├── cell_by_gene.csv
├── cell_metadata.csv
├── cell_metadata_truncated.csv
├── detected_transcripts.csv
├── images
│   ├── manifest.json
│   ├── micron_to_mosaic_pixel_transform.csv
│   ├── mosaic_DAPI_z0.tif
│   ├── mosaic_DAPI_z1.tif
│   ├── mosaic_DAPI_z2.tif
│   ├── mosaic_DAPI_z3.tif
│   ├── mosaic_DAPI_z4.tif
│   ├── mosaic_DAPI_z5.tif
│   ├── mosaic_DAPI_z6.tif
│   ├── mosaic_PolyT_z0.tif
│   ├── mosaic_PolyT_z1.tif
│   ├── mosaic_PolyT_z2.tif
│   ├── mosaic_PolyT_z3.tif
│   ├── mosaic_PolyT_z4.tif
│   ├── mosaic_PolyT_z5.tif
│   └── mosaic_PolyT_z6.tif
└── summary_20230317.png

1 directory, 23 files

So it seems our directory structures are a bit different, as the parquet file is directly in my ./region_0/ directory?

The parquet file seems to be more or less similar?:

> test_parq %>% str
Classes ‘sf’ and 'data.frame':  1018750 obs. of  10 variables:
 $ ID                : int  0 1 2 3 4 5 6 7 8 9 ...
 $ EntityID          :integer64 1771792400012100003 1771792400012100004 1771792400012100006 1771792400012100008 1771792400012100009 1771792400012100015 1771792400012100016 1771792400012100018 ... 
 $ ZIndex            : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Type              : chr  "cell" "cell" "cell" "cell" ...
 $ ZLevel            : num  1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 ...
 $ Name              : 'vctrs_unspecified' logi  NA NA NA NA NA NA ...
 $ ParentID          : 'vctrs_unspecified' logi  NA NA NA NA NA NA ...
 $ ParentType        : 'vctrs_unspecified' logi  NA NA NA NA NA NA ...
 $ X__index_level_0__: int  0 1 2 3 4 5 6 7 8 9 ...
 $ Geometry          :sfc_MULTIPOLYGON of length 1018750; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:98, 1:2] 1501 1501 1502 1506 1507 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "Geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:9] "ID" "EntityID" "ZIndex" "Type" ...
> test_parq %>% filter(ZIndex == 3) %>% pull(Geometry) %>% str
sfc_MULTIPOLYGON of length 152345; first list element: List of 2
 $ :List of 1
  ..$ : num [1:5, 1:2] 1539 1539 1539 1540 1539 ...
 $ :List of 1
  ..$ : num [1:89, 1:2] 1501 1502 1503 1506 1509 ...
 - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
alikhuseynov commented 1 year ago

No worries, sound good. Vizgen did updates on Merscope output as far as we know. Also, note that Vizgen is using only one z-plane, the middle plane of the tissue for Cellpose segmentation. So, the cell boundaries in all $ZIndex are the same.

I will add support for reading any single .parquet file in the given directory. For now, if you want, just rename the cell_boundaries.parquet to cellpose_micron_space.parquet. Then run this chunk to load data using BiocParallel instead of future

### make sure all libs are installed a priori!
## load libs ----
    suppressPackageStartupMessages({
    library(ggplot2)
    library(Seurat)
    library(dplyr)
    library(magrittr)
    library(BiocParallel)    
})

# set params
mol.type <- "microns"
coord.space <- "micron"
z.stack <- 3L # z stack to use
dir_use <- "./region_0/"

# load object
obj.vz <-
  LoadVizgen(data.dir = dir_use,
             fov = "test.sample", 
             assay = "Vizgen",
             use.cellpose.out = TRUE, # if TRUE and ./Cellpose dir exists
             metadata = c("volume", "fov"), # add cell volume info
             mol.type = mol.type, # molecule coords in µm space
             coord.space = coord.space, 
             type = c("segmentations", "centroids", "boxes"), # type of cell spatial coord matrices
             z = z.stack,
             add.zIndex = TRUE, # add z slice section to the object
             update.object = TRUE,
             use.BiocParallel = TRUE, # using `BiocParallel` instead of `future`
             workers.total = 10, # cores for `BiocParallel` processing
             DTthreads.pct = NULL # percentage of total threads to use for `data.table::fread`
  )
alikhuseynov commented 1 year ago

just add the support #7190, seems to work. just re-install: remotes::install_github(repo = 'alikhuseynov/seurat', ref = 'feat/vizgen')

andrewjkwok commented 1 year ago

Thanks for such quick responses.

I've reinstalled as you've instructed. The issue coming up now is:


Error in is.empty(.) : could not find function "is.empty"
> traceback()
3: files2scan %>% grep(coord.space, ., value = TRUE) %>% {
       if (is.empty(.)) {
           files2scan %>% grep(".parquet$", ., value = TRUE)
       }
       else {
           (.)
       }
   }
2: ReadVizgen(data.dir = data.dir, ...)
1: LoadVizgen(data.dir = dir_use, fov = "test.sample", assay = "Vizgen", 
       use.cellpose.out = TRUE, metadata = c("volume", "fov"), mol.type = mol.type, 
       coord.space = coord.space, type = c("segmentations", "centroids", 
           "boxes"), z = z.stack, add.zIndex = TRUE, update.object = TRUE, 
       use.BiocParallel = TRUE, workers.total = 10, DTthreads.pct = NULL)
The current directory structure is:

region_0/
├── 202303171059_KoLab20230317_VMSC04301_region_0.vzg
├── cell_by_gene.csv
├── cell_metadata.csv
├── cell_metadata_truncated.csv
├── cellpose_micron_space.parquet
├── detected_transcripts.csv
├── images
│   ├── manifest.json
│   ├── micron_to_mosaic_pixel_transform.csv
│   ├── mosaic_DAPI_z0.tif
│   ├── mosaic_DAPI_z1.tif
│   ├── mosaic_DAPI_z2.tif
│   ├── mosaic_DAPI_z3.tif
│   ├── mosaic_DAPI_z4.tif
│   ├── mosaic_DAPI_z5.tif
│   ├── mosaic_DAPI_z6.tif
│   ├── mosaic_PolyT_z0.tif
│   ├── mosaic_PolyT_z1.tif
│   ├── mosaic_PolyT_z2.tif
│   ├── mosaic_PolyT_z3.tif
│   ├── mosaic_PolyT_z4.tif
│   ├── mosaic_PolyT_z5.tif
│   └── mosaic_PolyT_z6.tif
└── summary_20230317.png

1 directory, 23 files

i.e. I've renamed my parquet file to what you've suggested?

alikhuseynov commented 1 year ago

Ok, given the last fix, you don't need to rename the .parquet file, it can be cell_boundaries.parquet or cellpose_micron_space.parquet is.empty() is a useful function from spatstat.geom package, in my case (spatstat.geom_3.0-3) is loaded but not attached (see session info below) once I load Seurat package. Try to install spatstat.geom first, then clean your session, load libraries and run the script. Then it should work.

below is my session info, just in case.

sessionInfo()
​
R version 4.2.1 (2022-06-23)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /media/../yyy/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SeuratObject_4.1.3 Seurat_4.3.0.9002 

loaded via a namespace (and not attached):
  [1] nlme_3.1-159           spatstat.sparse_3.0-0  matrixStats_0.62.0    
  [4] RcppAnnoy_0.0.19       RColorBrewer_1.1-3     httr_1.4.4            
  [7] repr_1.1.4             sctransform_0.3.5      tools_4.2.1           
 [10] utf8_1.2.2             R6_2.5.1               irlba_2.3.5           
 [13] KernSmooth_2.23-20     uwot_0.1.14            lazyeval_0.2.2        
 [16] colorspace_2.0-3       sp_1.5-0               gridExtra_2.3         
 [19] tidyselect_1.2.0       compiler_4.2.1         progressr_0.11.0      
 [22] cli_3.4.1              spatstat.explore_3.0-5 plotly_4.10.0         
 [25] scales_1.2.1           spatstat.data_3.0-0    lmtest_0.9-40         
 [28] ggridges_0.5.4         pbapply_1.7-0          goftest_1.2-3         
 [31] stringr_1.4.1          pbdZMQ_0.3-7           digest_0.6.29         
 [34] spatstat.utils_3.0-1   base64enc_0.1-3        pkgconfig_2.0.3       
 [37] htmltools_0.5.3        parallelly_1.32.1      fastmap_1.1.0         
 [40] htmlwidgets_1.5.4      rlang_1.1.1            shiny_1.7.2           
 [43] generics_0.1.3         zoo_1.8-10             jsonlite_1.8.0        
 [46] spatstat.random_3.0-1  ica_1.0-3              dplyr_1.1.2           
 [49] magrittr_2.0.3         patchwork_1.1.2        Matrix_1.5-1          
 [52] Rcpp_1.0.9             IRkernel_1.3           munsell_0.5.0         
 [55] fansi_1.0.3            abind_1.4-5            reticulate_1.26       
 [58] lifecycle_1.0.3        stringi_1.7.8          MASS_7.3-57           
 [61] Rtsne_0.16             plyr_1.8.7             grid_4.2.1            
 [64] parallel_4.2.1         listenv_0.8.0          promises_1.2.0.1      
 [67] ggrepel_0.9.2          crayon_1.5.2           deldir_1.0-6          
 [70] miniUI_0.1.1.1         lattice_0.20-45        IRdisplay_1.1         
 [73] cowplot_1.1.1          splines_4.2.1          tensor_1.5            
 [76] pillar_1.9.0           igraph_1.3.5           uuid_1.1-0            
 [79] spatstat.geom_3.0-3    future.apply_1.9.0     reshape2_1.4.4        
 [82] codetools_0.2-18       leiden_0.4.2           glue_1.6.2            
 [85] evaluate_0.16          data.table_1.14.6      png_0.1-7             
 [88] vctrs_0.6.2            httpuv_1.6.5           polyclip_1.10-0       
 [91] gtable_0.3.1           RANN_2.6.1             purrr_0.3.4           
 [94] tidyr_1.2.0            scattermore_0.8        future_1.28.0         
 [97] ggplot2_3.4.0          mime_0.12              xtable_1.8-4          
[100] later_1.3.0            survival_3.4-0         viridisLite_0.4.1     
[103] tibble_3.2.1           cluster_2.1.4          globals_0.16.1        
[106] fitdistrplus_1.1-8     ellipsis_0.3.2         ROCR_1.0-11 
andrewjkwok commented 1 year ago

Very sorry but it still throws an error, this time like this...

Error in segs[[i]][[1]] : subscript out of bounds
> traceback()
5: segs[[i]][[1]] %>% length
4: FUN(X[[i]], ...)
3: lapply(segs %>% seq, function(i) segs[[i]][[1]] %>% length)
2: ReadVizgen(data.dir = data.dir, ...)
1: LoadVizgen(data.dir = dir_use, fov = "test.sample", assay = "Vizgen", 
       use.cellpose.out = TRUE, metadata = c("volume", "fov"), mol.type = mol.type, 
       coord.space = coord.space, type = c("segmentations", "centroids", 
           "boxes"), z = z.stack, add.zIndex = TRUE, update.object = TRUE, 
       use.BiocParallel = TRUE, workers.total = 10, DTthreads.pct = NULL)

Very happy to share data if you would prefer to best troubleshoot?

alikhuseynov commented 1 year ago

yes, it would then be easier to debug. if you can upload & share on gdrive (alikhuseyno/at/gmail.com) or any link where I can download these files:

detected_transcripts.csv
cell_by_gene.csv
cell_metadata.csv
cell_boundaries.parquet

there will be more updates on that PR #7190 Thanks

alikhuseynov commented 1 year ago

I think the problem might come from your .parquet file, that is, it gives 2 lists in the Geometry variable, where it should be 1 list.

> test_parq %>% filter(ZIndex == 3) %>% pull(Geometry) %>% str
sfc_MULTIPOLYGON of length 152345; first list element: List of 2
 $ :List of 1
  ..$ : num [1:5, 1:2] 1539 1539 1539 1540 1539 ...
 $ :List of 1
  ..$ : num [1:89, 1:2] 1501 1502 1503 1506 1509 ...
 - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
andrewjkwok commented 1 year ago

Shared - please let me know if there are issues accessing the data.

alikhuseynov commented 1 year ago

got it! As far as I see, there are number of segmentation artifacts and empty polygon information. This is a very useful case test, since I never saw such an output from Vizgen. I have to implement few sanity/cleaning steps before extracting segmented cell boundaries, then it should work. will let you know ;) thanks

cnk113 commented 1 year ago

I'm also getting the same issues as above Error in segs[[i]][[1]] : subscript out of bounds

alikhuseynov commented 1 year ago

yes, I'm working on the fix, which will be available asap. The problem is that .parquet file in this case has "artifacts", a number of empty and nested lists for single cell segmentation/polygon boundaries.

ideally one should have single polygon for a single segmented cell ID:

List of 1
 $ :List of 1
  ..$ : num [1:26, 1:2] 2751 2753 2754 2757 2759 ...
 - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"

However, some have these, ie multiple polygons for single cell.

List of 4
 $ :List of 1
  ..$ : num [1:7, 1:2] 1513 1513 1512 1511 1512 ...
 $ :List of 1
  ..$ : num [1:5, 1:2] 1525 1525 1526 1526 1525 ...
 $ :List of 2
  ..$ : num [1:82, 1:2] 1495 1496 1497 1497 1499 ...
  ..$ : num [1:17, 1:2] 1507 1510 1512 1514 1516 ...
 $ :List of 1
  ..$ : num [1:6, 1:2] 1498 1497 1497 1498 1498 ...
 - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"

#..or this
List of 2
 $ :List of 1
  ..$ : num [1:5, 1:2] 1539 1539 1539 1540 1539 ...
 $ :List of 1
  ..$ : num [1:89, 1:2] 1501 1502 1503 1506 1509 ...
 - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"

It's hard to tell which one of those multiple polygons would be the actual segmented cell. Right now, I'm keeping the single polygon with maximum points for a single cell. Suggestions on which polygons to choose are welcome. Thanks

cnk113 commented 1 year ago

Yah I think max points would be fine, would love to try the fix!

Thanks, Chang

alikhuseynov commented 1 year ago

The fix is ready, try out. re-install: remotes::install_github(repo = 'alikhuseynov/seurat', ref = 'feat/vizgen') I will be adding support according to suggestions in PR #7190

These packages pkgs <- c("data.table", "arrow", "sfarrow", "tidyverse", "furrr", "BiocParallel") should be installed, else the function will ask you to install them.

The test Vizgen data is from @andrewjkwok, this part will generate the object:

## make sure all libs are installed a priori!
# load libs ----
suppressPackageStartupMessages({
  library(ggplot2)
  library(Seurat)
  library(dplyr)
  library(magrittr)
  library(BiocParallel)
  library(progressr)
  library(spatstat)  
})

# helper function to return Seurat object metadata
callmeta <- function (object = NULL) { 
  return(object@meta.data) 
  }

# set params
mol.type <- "microns"
coord.space <- "micron"
z.stack <- 3L # z stack to use
dir_use <- "./merfish_seurat_debug/" # or something like "./region_0/"

##
start.time <- Sys.time()
obj <- 
  LoadVizgen(data.dir = dir_use,
             fov = "merfish.test", 
             assay = "Vizgen",
             metadata = c("volume", "fov"), # add cell volume info
             mol.type = mol.type, # molecule coords in µm space
             coord.space = coord.space, 
             type = c("segmentations", "centroids"), # type of cell spatial coord matrices
             z = z.stack,
             add.zIndex = TRUE, # add z slice section to a cell
             update.object = TRUE,
             use.BiocParallel = TRUE,
             workers.MulticoreParam = 14, # for `BiocParallel` processing
             use.furrr = TRUE, # when working with list use furrr or purrr
             )
end.time <- Sys.time()
message("Time taken to load object = ", 
        round(end.time - start.time, digits = 2), " ", 
        attr(c(end.time - start.time), "units"))

Result, it took ~ 5 mins with workers.MulticoreParam = 14 (jupyter session RAM - 180GB)

obj
obj %>% callmeta %>% str
An object of class Seurat 
140 features across 232695 samples within 1 assay 
Active assay: Vizgen (140 features, 0 variable features)
 1 spatial field of view present: merfish.test
'data.frame':   232695 obs. of  6 variables:
 $ orig.ident     : Factor w/ 1 level "SeuratProject": 1 1 1 1 1 1 1 1 1 1 ...
 $ nCount_Vizgen  : num  11 69 73 9 14 1 23 3 2 26 ...
 $ nFeature_Vizgen: int  7 17 25 6 13 1 12 3 2 6 ...
 $ z              : int  3 3 3 3 3 3 3 3 3 3 ...
 $ volume         : num  9961 10654 3499 4381 2164 ...
 $ fov            : int  217 217 217 217 217 217 217 217 217 217 ...

test_merfish

cnk113 commented 1 year ago

Hmm I'm getting >>> using segmentations Error in molecules %iff% list(molecules = CreateMolecules(coords = molecules, : object 'mol.type' not found even though I set it to 'microns'

alikhuseynov commented 1 year ago

strange, could you please list the data.dir (eg "./region_0/"), to see what files are in there?

cnk113 commented 1 year ago

202304040410_gw23-121422-thalamusap0903-verymedial_VMSC01801_region_0.vzg cell_boundaries.parquet cell_by_gene.csv cell_metadata.csv detected_transcripts.csv images summary.png

alikhuseynov commented 1 year ago

if you run this:

# set params
mol.type <- "microns"
coord.space <- "micron"
z.stack <- 3L # z stack to use
dir_use <- "./merfish_seurat_debug/"

start.time <- Sys.time()
data <-
  ReadVizgen(data.dir = dir_use,  
  #ReadVizgen_opt(data.dir = dir_use,
                 metadata = c("volume", "fov"), # add cell volume info
                 mol.type = mol.type, # molecule coords in µm space
                 coord.space = coord.space, 
                 type = c("segmentations", "centroids"), # type of cell spatial coord matrices
                 z = z.stack,
                 use.BiocParallel = TRUE,
                 workers.MulticoreParam = 14, # for `BiocParallel` processing
                 use.furrr = TRUE)
data %>% str

The output of data should be similar to this one, which includes $microns

List of 6
 $ transcripts  :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. ..@ i       : int [1:3141093] 33 46 67 74 97 103 120 3 6 20 ...
  .. ..@ p       : int [1:637986] 0 0 7 24 49 55 68 68 68 69 ...
  .. ..@ Dim     : int [1:2] 140 637985
  .. ..@ Dimnames:List of 2
  .. .. ..$ : chr [1:140] "4732456N10Rik" "Ace2" "Adora2a" "Aldh1l1" ...
  .. .. ..$ : chr [1:637985] "1771792400012100003" "1771792400012100004" "1771792400012100006" "1771792400012100008" ...
  .. ..@ x       : num [1:3141093] 1 2 1 1 1 3 2 7 3 1 ...
  .. ..@ factors : list()
 $ segmentations:Classes ‘data.table’ and 'data.frame': 4214160 obs. of  3 variables:
  ..$ x   : num [1:4214160] 1501 1502 1503 1506 1509 ...
  ..$ y   : num [1:4214160] 41.3 43 43.5 41.2 39.7 ...
  ..$ cell: chr [1:4214160] "1771792400012100003" "1771792400012100003" "1771792400012100003" "1771792400012100003" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ centroids    :'data.frame':  637985 obs. of  3 variables:
  ..$ x   : num [1:637985] 1519 1511 1573 1577 1536 ...
  ..$ y   : num [1:637985] 38 50.3 58.7 72.2 65.6 ...
  ..$ cell: chr [1:637985] "1771792400012100003" "1771792400012100004" "1771792400012100006" "1771792400012100008" ...
 $ microns      :'data.frame':  3058829 obs. of  3 variables:
  ..$ x   : num [1:3058829] 97.3 24.2 -23.8 28.1 -22 ...
  ..$ y   : num [1:3058829] 4312 4412 4238 4239 4254 ...
  ..$ gene: chr [1:3058829] "Ace2" "Adora2a" "Aldh1l1" "Aldh1l1" ...
 $ metadata     :'data.frame':  637985 obs. of  2 variables:
  ..$ volume: num [1:637985] 2728 9961 10654 3499 4381 ...
  ..$ fov   : int [1:637985] 217 217 217 217 217 217 217 217 217 217 ...
 $ zIndex       :'data.frame':  637985 obs. of  2 variables:
  ..$ z   : int [1:637985] 3 3 3 3 3 3 3 3 3 3 ...
  ..$ cell: chr [1:637985] "1771792400012100003" "1771792400012100004" "1771792400012100006" "1771792400012100008" ...
cnk113 commented 1 year ago

Ok that works, thanks!

alikhuseynov commented 1 year ago

Ok that works, thanks!

could you list it here?

cnk113 commented 1 year ago
List of 6
 $ transcripts  :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. ..@ i       : int [1:2121672] 3 4 8 13 21 27 33 38 45 46 ...
  .. ..@ p       : int [1:94402] 0 45 109 199 237 324 372 453 552 645 ...
  .. ..@ Dim     : int [1:2] 180 94401
  .. ..@ Dimnames:List of 2
  .. .. ..$ : chr [1:180] "TLL1" "GADD45B" "SLC32A1" "PFN1" ...
  .. .. ..$ : chr [1:94401] "1952334800040100001" "1952334800040100002" "1952334800040100003" "1952334800040100004" ...
  .. ..@ x       : num [1:2121672] 1 1 1 1 14 1 1 1 1 1 ...
  .. ..@ factors : list()
 $ segmentations:Classes 'data.table' and 'data.frame': 891093 obs. of  3 variables:
  ..$ x   : num [1:891093] 5279 5279 5280 5281 5283 ...
  ..$ y   : num [1:891093] -50.4 -49.8 -49 -48.7 -49.4 ...
  ..$ cell: chr [1:891093] "1952334800040100041" "1952334800040100041" "1952334800040100041" "1952334800040100041" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ centroids    :'data.frame':  94401 obs. of  3 variables:
  ..$ x   : num [1:94401] 5279 5279 5288 5272 5295 ...
  ..$ y   : num [1:94401] -67.8 -50.7 -46.9 -32 -13.1 ...
  ..$ cell: chr [1:94401] "1952334800040100001" "1952334800040100002" "1952334800040100003" "1952334800040100004" ...
 $ microns      :'data.frame':  3286088 obs. of  3 variables:
  ..$ x   : num [1:3286088] 85.3 85.3 87.4 85.6 86.5 ...
  ..$ y   : num [1:3286088] 5609 5609 5609 5610 5611 ...
  ..$ gene: chr [1:3286088] "SLC32A1" "SLC32A1" "SLC32A1" "SLC32A1" ...
 $ metadata     :'data.frame':  94401 obs. of  2 variables:
  ..$ volume: num [1:94401] 223 335 572 167 974 ...
  ..$ fov   : int [1:94401] 816 816 816 816 816 816 816 816 816 816 ...
 $ zIndex       :'data.frame':  94401 obs. of  2 variables:
  ..$ z   : int [1:94401] 3 3 3 3 3 3 3 3 3 3 ...
  ..$ cell: chr [1:94401] "1952334800040100001" "1952334800040100002" "1952334800040100003" "1952334800040100004" ...
alikhuseynov commented 1 year ago

hmm, interesting :)

does this part work too? Given the data you already read.

assay <- "Vizgen"
mol.type <- "microns"

segs <- CreateSegmentation(data[["segmentations"]])
cents <- CreateCentroids(data[["centroids"]])
segmentations.data <- list(centroids = cents, segmentation = segs)
coords <- CreateFOV(coords = segmentations.data, 
                                     type = c("segmentation", "centroids"), 
                                     molecules = data[[mol.type]], 
                                     assay = assay)
cnk113 commented 1 year ago

Yes this works coords looks like this:

Spatial coordinates for 94401 cells and 180 molecules
 First 10 molecules: ADGRL2, ADGRV1, APOLD1, ARX, ASCL1, BCAS1
Default segmentation boundary: centroids 
 1 other segmentation boundaries present: segmentation 
Associated assay: Vizgen 
Key: Vizgen_ 
alikhuseynov commented 1 year ago

then, it looks good. could you post your previous LoadVizgen() run?

cnk113 commented 1 year ago

Hmm so it actually works fine now, IDK why it was failing earlier. Thanks for the quick support!

alikhuseynov commented 1 year ago

ok, great!

andrewjkwok commented 1 year ago

@alikhuseynov Thank you so much for the support. I can confirm that it's working now for me with the parquet file.

However, I am now running into a problem trying to load data from the old format in - I wanted to test my data against some of Vizgen's released data (https://info.vizgen.com/mouse-brain-map), and with the current LoadVizgen function, I get the following error:

Error in sfarrow::st_read_parquet(file2read) : 
  object 'file2read' not found
> traceback()
4: make_readable_file(file, mmap)
3: arrow::ParquetFileReader$create(dsn, ...)
2: sfarrow::st_read_parquet(file2read)
1: ReadVizgen(data.dir = "/mnt/z/Users/Andrew_K/Projects/NTS_projection_transcriptome_2023/Data/BrainReceptorShowcase/Slice1/Replicate1/")

Is it possible for the read function to support backward compatibility?

alikhuseynov commented 1 year ago

@alikhuseynov Thank you so much for the support. I can confirm that it's working now for me with the parquet file.

However, I am now running into a problem trying to load data from the old format in - I wanted to test my data against some of Vizgen's released data (https://info.vizgen.com/mouse-brain-map), and with the current LoadVizgen function, I get the following error:

Error in sfarrow::st_read_parquet(file2read) : 
  object 'file2read' not found
> traceback()
4: make_readable_file(file, mmap)
3: arrow::ParquetFileReader$create(dsn, ...)
2: sfarrow::st_read_parquet(file2read)
1: ReadVizgen(data.dir = "/mnt/z/Users/Andrew_K/Projects/NTS_projection_transcriptome_2023/Data/BrainReceptorShowcase/Slice1/Replicate1/")

Is it possible for the read function to support backward compatibility?

Thanks, glad it works. Your shared data was very useful for tests. I haven't yet tested the older Vizgen data (ie .hdf5 cell boundaries). The new fix should be ready next week, which among some optimization will also handle data if no .parquet file exists. For now, set use.cellpose.out = FALSE in LoadVizgen() and try to load.

andrewjkwok commented 1 year ago

Great, thanks. I tried setting use.cellpose.out = FALSE as suggested - it only works if I also set use.BiocParallel = FALSE, which unfortunately means it takes quite a while to load the dataset in, but it does work!

dfhannum commented 1 year ago

@alikhuseynov Thank you so much for the support. I can confirm that it's working now for me with the parquet file.

However, I am now running into a problem trying to load data from the old format in - I wanted to test my data against some of Vizgen's released data (https://info.vizgen.com/mouse-brain-map), and with the current LoadVizgen function, I get the following error:

Error in sfarrow::st_read_parquet(file2read) : 
  object 'file2read' not found
> traceback()
4: make_readable_file(file, mmap)
3: arrow::ParquetFileReader$create(dsn, ...)
2: sfarrow::st_read_parquet(file2read)
1: ReadVizgen(data.dir = "/mnt/z/Users/Andrew_K/Projects/NTS_projection_transcriptome_2023/Data/BrainReceptorShowcase/Slice1/Replicate1/")

Is it possible for the read function to support backward compatibility?

I also got this error when there were multiple parquet files in the folder.

dfhannum commented 1 year ago

I was able to get it to work with hdf5 files by just using use_cellpose_output = FALSE. I did get this warning: Warning: MulticoreParam() not supported on Windows, use SnowParam(), loading each file, but everything loaded fine in the end.

Hmm so it actually works fine now, IDK why it was failing earlier. Thanks for the quick support!

I had this same issue, and eventually realized I had mol.type = 'microns' as an environment variable when testing later on, which fixed the LoadVizgen() function.

alikhuseynov commented 1 year ago

I'm going to push for a fix later today, that will solve above mentioned issues. use.cellpose.out arg will be removed completely. If no .parquet file(s) are present, then .hdf5files are loaded from ./cell boundaries directory (if present) using use.BiocParallel = TRUE by default.

MulticoreParam() not supported on Windows, use SnowParam() are you running things on Windows laptop?

For multiple .parquet files, you mean something like this?

cellpose_micron_space.parquet
cellpose_mosaic_space.parquet

if yes, that will be solved too.

..will do few tests on my side, before committing Thanks

dfhannum commented 1 year ago

Yes, I was running it on a Windows laptop.

For the multiple .parquet files, I had a subset of the data in a folder of the directory for testing. This lead to multiple file2read and an error when the file wasn't located in the directory (but a folder within the directory). I only wanted to read the parquet file in the current directory.

alikhuseynov commented 1 year ago

I see, yeah.. that was done for the situation when one has .parquet file(s) in a sub directory, eg Cellpose_polyT_CB3: in ./region_0

├── xxx_region_0.vzg
├── Cellpose_polyT_CB3
├── detected_transcripts.csv
├── images
└── summary.png

I will try to add support to like: .."look for parquet file only in the current dir, if not found, look in the sub dir"

alikhuseynov commented 1 year ago

I just did the last fix, please try it out. It did work on my side for old data (ie .hdf5 files) with optimized BiocParallel, and also on newer Vizgen outputs. It should also support parallelization on windows machine now. If any bugs or something I missed, let me know ;)

Test run would be this

## test =================================================
## make sure all libs are installed a priori!
# load libs ----
suppressPackageStartupMessages({
  library(ggplot2)
  library(Seurat)
  library(dplyr)
  library(magrittr)
  library(BiocParallel)
  library(progressr)
  library(spatstat)    
})

# helper function to return Seurat object metadata
callmeta <- function (object = NULL) { 
  return(object@meta.data) 
  }

# set directory to read from
dir_use <- "./merfish_seurat_debug/" # or something like "./region_0/"

##
start.time <- Sys.time()
obj <-
  LoadVizgen(data.dir = dir_use,  
             fov = "merfish.test", 
             assay = "Vizgen",
             metadata = c("volume", "fov"), # add cell volume info
             type = c("segmentations", "centroids"), # type of cell spatial coord matrices
             z = 3L,
             add.zIndex = TRUE, # add z slice section to a cell
             update.object = TRUE,
             use.BiocParallel = TRUE,
             workers.MulticoreParam = 14, # for `BiocParallel` processing
             verbose = TRUE
             )
end.time <- Sys.time()
message("Time taken to load object = ", 
        round(end.time - start.time, digits = 2), " ", 
        attr(c(end.time - start.time), "units"))
obj
obj %>% callmeta %>% str
sessionInfo() ``` R version 4.2.1 (2022-06-23) Platform: x86_64-conda-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS/LAPACK: /xxx/envs/spatial_develop/lib/libopenblasp-r0.3.21.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] spatstat_2.3-4 spatstat.linnet_2.3-2 spatstat.core_2.4-4 [4] rpart_4.1.16 nlme_3.1-159 spatstat.random_3.0-1 [7] spatstat.geom_3.0-3 spatstat.data_3.0-0 progressr_0.11.0 [10] BiocParallel_1.30.4 magrittr_2.0.3 dplyr_1.1.2 [13] SeuratObject_4.1.3 Seurat_4.3.0.9002 ggplot2_3.4.0 loaded via a namespace (and not attached): [1] readxl_1.4.1 uuid_1.1-0 backports_1.4.1 [4] plyr_1.8.7 igraph_1.3.5 repr_1.1.4 [7] lazyeval_0.2.2 sp_1.5-0 splines_4.2.1 [10] listenv_0.8.0 scattermore_0.8 digest_0.6.29 [13] htmltools_0.5.3 fansi_1.0.3 tensor_1.5 [16] googlesheets4_1.0.1 cluster_2.1.4 ROCR_1.0-11 [19] tzdb_0.3.0 readr_2.1.3 globals_0.16.1 [22] modelr_0.1.9 matrixStats_0.62.0 spatstat.sparse_3.0-0 [25] colorspace_2.0-3 rvest_1.0.3 ggrepel_0.9.2 [28] haven_2.5.1 crayon_1.5.2 jsonlite_1.8.0 [31] survival_3.4-0 zoo_1.8-10 glue_1.6.2 [34] polyclip_1.10-0 gtable_0.3.1 gargle_1.2.0 [37] leiden_0.4.2 future.apply_1.9.0 abind_1.4-5 [40] scales_1.2.1 DBI_1.1.2 miniUI_0.1.1.1 [43] Rcpp_1.0.9 viridisLite_0.4.1 xtable_1.8-4 [46] reticulate_1.26 bit_4.0.4 htmlwidgets_1.5.4 [49] httr_1.4.4 RColorBrewer_1.1-3 ellipsis_0.3.2 [52] ica_1.0-3 pkgconfig_2.0.3 uwot_0.1.14 [55] dbplyr_2.2.1 deldir_1.0-6 utf8_1.2.2 [58] tidyselect_1.2.0 rlang_1.1.1 reshape2_1.4.4 [61] later_1.3.0 munsell_0.5.0 cellranger_1.1.0 [64] tools_4.2.1 cli_3.4.1 generics_0.1.3 [67] broom_1.0.1 ggridges_0.5.4 evaluate_0.16 [70] stringr_1.4.1 fastmap_1.1.0 goftest_1.2-3 [73] bit64_4.0.5 fs_1.5.2 fitdistrplus_1.1-8 [76] purrr_1.0.1 RANN_2.6.1 pbapply_1.7-0 [79] future_1.28.0 mime_0.12 xml2_1.3.3 [82] arrow_9.0.0 hdf5r_1.3.5 compiler_4.2.1 [85] plotly_4.10.0 png_0.1-7 spatstat.utils_3.0-1 [88] reprex_2.0.2 tibble_3.2.1 stringi_1.7.8 [91] forcats_0.5.2 lattice_0.20-45 IRdisplay_1.1 [94] Matrix_1.5-1 vctrs_0.6.2 furrr_0.3.1 [97] pillar_1.9.0 lifecycle_1.0.3 lmtest_0.9-40 [100] RcppAnnoy_0.0.19 data.table_1.14.6 cowplot_1.1.1 [103] irlba_2.3.5 httpuv_1.6.5 patchwork_1.1.2 [106] R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20 [109] gridExtra_2.3 parallelly_1.32.1 codetools_0.2-18 [112] MASS_7.3-57 assertthat_0.2.1 withr_2.5.0 [115] sctransform_0.3.5 hms_1.1.2 mgcv_1.8-40 [118] parallel_4.2.1 sfarrow_0.4.1 grid_4.2.1 [121] tidyverse_1.3.2 IRkernel_1.3 tidyr_1.2.0 [124] googledrive_2.0.0 Cairo_1.6-0 Rtsne_0.16 [127] pbdZMQ_0.3-7 spatstat.explore_3.0-5 lubridate_1.8.0 [130] shiny_1.7.2 base64enc_0.1-3 ```
andrewjkwok commented 1 year ago

Thanks for the update.

I installed the latest version with remotes::install_github(repo = 'alikhuseynov/seurat', ref = 'feat/vizgen').

This somehow now throws up an error:

Cell segmentations are found in `.parquet` file
>>> using micron space coordinates
Error in files[["spatial"]] : subscript out of bounds
> traceback()
2: ReadVizgen(data.dir = data.dir, mol.type = mol.type, filter = filter, 
       z = z, verbose = verbose, ...)
1: LoadVizgen(data.dir = dir_use, fov = "merfish.test", assay = "Vizgen", 
       metadata = c("volume", "fov"), mol.type = mol.type, type = c("segmentations", 
           "centroids"), z = z.stack, add.zIndex = TRUE, update.object = TRUE, 
       use.BiocParallel = TRUE, workers.MulticoreParam = 14, use.furrr = TRUE, 
       verbose = TRUE)

Incidentally I keep also getting an error saying that there's no default argument for verbose, but that's minor. Thanks again in advance for the support of this!

alikhuseynov commented 1 year ago

Thanks for the update.

I installed the latest version with remotes::install_github(repo = 'alikhuseynov/seurat', ref = 'feat/vizgen').

This somehow now throws up an error:

Cell segmentations are found in `.parquet` file
>>> using micron space coordinates
Error in files[["spatial"]] : subscript out of bounds
> traceback()
2: ReadVizgen(data.dir = data.dir, mol.type = mol.type, filter = filter, 
       z = z, verbose = verbose, ...)
1: LoadVizgen(data.dir = dir_use, fov = "merfish.test", assay = "Vizgen", 
       metadata = c("volume", "fov"), mol.type = mol.type, type = c("segmentations", 
           "centroids"), z = z.stack, add.zIndex = TRUE, update.object = TRUE, 
       use.BiocParallel = TRUE, workers.MulticoreParam = 14, use.furrr = TRUE, 
       verbose = TRUE)

Incidentally I keep also getting an error saying that there's no default argument for verbose, but that's minor. Thanks again in advance for the support of this!

You are welcome, I want to make sure it's robust and always works.

Does your dir_use has these files? If it's another dataset, those files should be present there.

merfish_seurat_debug/
├── cell_boundaries.parquet
├── cell_by_gene.csv
├── cell_metadata.csv
└── detected_transcripts.csv

files[["spatial"]] is the basically the cell_metadata.csv. Error in files[["spatial"]] : subscript out of bounds would mean that it could not find cell_metadata.csv file or multiple cell_metadata.csv files exist inside dir_use. Also you don't have to provide mol.type (it's set to microns by default) and z as environmental variables or inside the LoadVizgen(), unless you want to use other than z = 3L slice. It works on my side for a number of runs/datasets and there no warning for verbose, so not sure yet why it happens. Try to run as in my comment above and let me know. Thanks

alikhuseynov commented 1 year ago

..for sanity check, could you run the following and see if all 3 files are present?

sanity check ``` data.dir <- dir_use # your directory with Vizgen data # run this files <- c(transcripts = NULL %||% list.files(data.dir, pattern = "cell_by_gene", full.names = TRUE) %>% { if (is.empty(.)) { list.files(data.dir, pattern = "cell_by_gene", full.names = TRUE, recursive = TRUE) } else { (.) }} %>% { if (length(.) > 1) { message("There are > 1 `cell_by_gene` files", "\n", "make sure only 1 file is read") stop() } else if (!length(.)) { message("No `cell_by_gene` file is available") stop() } else { (.) }}, spatial = NULL %||% list.files(data.dir, pattern = "cell_metadata", full.names = TRUE) %>% { if (is.empty(.)) { list.files(data.dir, pattern = "cell_metadata", full.names = TRUE, recursive = TRUE) } else { (.) }} %>% { if (length(.) > 1) { message("There are > 1 `cell_metadata` files", "\n", "make sure only 1 file is read") stop() } else if (!length(.)) { message("No `cell_metadata` file is available") stop() } else { (.) }}, molecules = NULL %||% list.files(data.dir, pattern = "detected_transcripts", full.names = TRUE) %>% { if (is.empty(.)) { list.files(data.dir, pattern = "detected_transcripts", full.names = TRUE, recursive = TRUE) } else { (.) }} %>% { if(length(.) > 1) { message("There are > 1 `detected_transcripts` files", "\n", "make sure only 1 file is read") stop() } else if (!length(.)) { message("No `detected_transcripts` file is available") stop() } else { (.) }} ) ``` #### output should look something like this: ``` files transcripts: ‘/xxx/vizgen_test_repo/merfish_seurat_debug//cell_by_gene.csv’ spatial: ‘/xxx/vizgen_test_repo/merfish_seurat_debug//cell_metadata.csv’ molecules: ‘/xxx/vizgen_test_repo/merfish_seurat_debug//detected_transcripts.csv’ ```
andrewjkwok commented 1 year ago

Got it - turns out exactly as you predicted there was an extra cell metadata file (had tried to edit a version) - works now after deleting that file!