Error in 'SpatialPCA_buildKernel'

Changzhanhe commented 1 year ago

Hi! I was using SpatialPCA to analyze spatial data, and I tried to create a spatial object using standard normalization method in Seurat instead of using SCTransform, I also chose HVG as 'gene.selection'. But when I ran SpatialPCA_buildKernel, it gave the error below:

Warning message in mclapply(1:dim(location)[1], fx_gaussian, mc.cores = ncores): “scheduled cores 1, 2, 3 did not deliver results, all values of the jobs will be affected” Error in vectbl_as_col_location2(): ! Can't extract columns past the end. Location 1 doesn't exist. There are only 0 columns.

I would appreciate if you know the reason. Thanks!

shangll123 commented 1 year ago

Hi, thank you for your interest in SpatialPCA! It looks the problem is in the fx_gaussian() function, which is embedded in the kernel_build_sparse() function.

First thing to check is, are you using the sparse option to build the kernel matrix? If your sample size is not that large, say, >5000, I would recommend using sparse=FALSE option, because the estimation would be more accurate.

If your data indeed has a large sample size and you want to use the sparse=TRUE option, I would recommend checking the location matrix: object@location, and also the bandwidth: object@bandwidth, to test, for example, if the function

kernelmat = kernel_build_sparse(kerneltype="gaussian",location=scale(object@location), bandwidth=object@bandwidth, tol = 1e-5, ncores=1)

is working well. Because now you are using standard normalization method in Seurat instead of using SCTransform, the estimated bandwidth from the normalized data will be different, which will affect the kernel matrix.

If these still don't work, please feel free to send me an email (shanglu [at] umich.edu) and share this part of analysis codes or example location matrix that can replicate this error with me, I am happy to help. Thank you.

Changzhanhe commented 1 year ago

Thanks for your reply! Actually I was using a very large dataset (cells number nearly 300,000) to run SpatialPCA. And this is my code:

object = SpatialPCA_buildKernel(object, kerneltype="gaussian", bandwidthtype="Silverman", sparseKernel = TRUE, sparseKernel_ncore=3, sparseKernel_tol=0)

Here to save memory I set tol = 0 and I also test the function kernel_build_sparse() , it didn't work well. The same error as I mentioned before.

I was wondering when I subset scaled location matrix to test mclapply(1:dim(location)[1], fx_gaussian, mc.cores = ncores), it can calculate the result list with $ind_i $ind_j $val_i without any error reported. But when I use the full location matrix, it cannot calculate correctly. I guess probably because of mclapply() function and I am not sure.

shangll123 commented 1 year ago

Oh I see, setting tol=0 may cause errors, because we want to keep elements in the kernel matrix with values greater than the threshold. If you set tol=0 it means any value greater than 0 will be retained, which is not what you want.

Also, there is one matrix inversion step in the SpatialPCA_SpatialPCs() function, which involves an n by n matrix inverse, if you have a sample size ~300,000, I don't know if you have enough disk space for this inverse. I am working on a fast version of SpatialPCA to work around this, will keep you posted once we add this feature to the package. Thank you!

Changzhanhe commented 1 year ago

Oh I see, setting tol=0 may cause errors, because we want to keep elements in the kernel matrix with values greater than the threshold. If you set tol=0 it means any value greater than 0 will be retained, which is not what you want.

Also, there is one matrix inversion step in the SpatialPCA_SpatialPCs() function, which involves an n by n matrix inverse, if you have a sample size ~300,000, I don't know if you have enough disk space for this inverse. I am working on a fast version of SpatialPCA to work around this, will keep you posted once we add this feature to the package. Thank you!

Great! Thank you!

shangll123 / SpatialPCA

Error in 'SpatialPCA_buildKernel' #3