quadbio / simspec

Calculation of Reference/Cluster Similarity Spectrum (RSS/CSS)
30 stars 3 forks source link

query data projection error #5

Closed kaizen89 closed 2 years ago

kaizen89 commented 2 years ago

Hi, I have a reference dataset that I integrated with simspec, now I would like to project on it a small public dataset to see where it would project on the UMAP space. Unfortunately the code that you provide on the README throws an error

model <- cluster_sim_spectrum(seu_object, label_tag = "orig.ident", return_seuratObj = F)
seurat_query <- css_project(obj_pub, model)
Error in as(Y, "dgCMatrix") : 
  no method or default for coercing “numeric” to “dgCMatrix”

Could you please provide a small example on how to proceed with the projection ? Thanks

daniel-spies commented 2 years ago

hi there,

thanks for the great and super fast tool!

I encountered the same error when using a model that was created from a simspec object after calling clusters and using the cluster number as the label_tag. If instead I create the model using 'orig.ident' I get another error

model <- cluster_sim_spectrum(data_normal, label_tag='orig.ident', num_pcs_compute = ndim, num_pcs_use = ndim, k=k, cluster_resolution=0.6, spectrum_type='corr_kernel', corr_method= 'pearson' ,threads=4, return_seuratObj = F)
simspec_tumor <- css_project(data_tumor@assays$RNA@data, model)
Error in sim * model$args["lambda"] : 
non-numeric argument to binary operator

in both cases the error occurs when combining the model$profiles in the ref_sim_spectrum.default section.

best Daniel

kaizen89 commented 2 years ago

Hi @zhisonghe , could you please provide some help regarding this issue? Thanks a lot!

zhisonghe commented 2 years ago

Hi @kaizen89 and @daniel-spies

Thanks for using simspec and really sorry for such a long delay on response due to some personal issues.

May I ask for a bit more details about the model, particularly the number of clusters obtained during clustering on each reference sample? You can check it by looking into model$model$profiles which should be a list with length equal to the reference sample number, with each element being a matrix with columns representing the clusters of that sample. I'm wondering whether any of those matrices has only one column.

Look forward to hearing back from you.

Best, Zhisong

kaizen89 commented 2 years ago

You are right, when I removed the sample which had one column only the error was gone. Thanks!

zhisonghe commented 2 years ago

In the recent update there is the min_cluster_num parameter added to the cluster_sim_spectrum function with default value of 3. This value sets a threshold on the cluster number per sample and samples with too few clusters will be discarded from the reference panel. This should essentially avoid such an issue.