Unable to replicate results for SWNE Walkthrough using Seurat

JoannaTan commented 5 years ago

Hi,

I followed the example given in https://yanwu2014.github.io/swne/Examples/pbmc3k_swne_seurat.html

However, I could not get back the same clustering as shown in the example (please find attached the clusters test_swne.pdf )

The following warning messages were obtained while I ran the following command: swne.embedding <- RunSWNE(obj, k = 16, genes.embed = genes.embed)

Warning messages: 1: In system.time(out <- .Call("NNLM_nnmf", A, as.integer(k), init.mask$Wi, : Target tolerance not reached. Try a larger max.iter. 2: In if (return.format == "embedding") { : the condition has length > 1 and only the first element will be used

Is the difference in clustering due to an issue with NNLM package? I noted that it was removed from the CRAN repository.

yanwu2014 commented 5 years ago

Hi,

Sorry for the late reply. That is kinda strange. So I'm not able to reproduce your plot but I'm also not getting the target tolerance warning which may be a factor. Can you try reinstalling the latest version of NNLM and then SWNE and see if that helps? Let me know if you're still getting that same plot afterwards.

devtools::install_github("linxihui/NNLM")
devtools::install_github("yanwu2014/swne")

JoannaTan commented 5 years ago

Hi @yanwu2014 ,

Thank you for your reply.

I re-installed the latest version of swne and NNLM and I am still getting the following swne_pbmc_new_20190604.pdf

I am still getting the same warning messages.

Or should I install an older version of NNLM?

yanwu2014 commented 5 years ago

Okay I increased the default max.iter when running the NMF. Can you try reinstalling swne and seeing if that helps? Also can you run sessionInfo() and paste the output? I wonder if it has something to do with the ICA initialization

JoannaTan commented 5 years ago

Hi @yanwu2014,

I have re-installed swne and re-run the codes but I am still not getting the results and I am still getting the same warnings. Please find attached my plot 20190606_swne_plot.pdf

please find below my sessionInfo()

sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Seurat_3.0.1                swne_0.5.4                  SingleCellExperiment_1.4.1  SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [6] BiocParallel_1.14.2         matrixStats_0.54.0          Biobase_2.40.0              GenomicRanges_1.34.0        GenomeInfoDb_1.18.2        
[11] IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0         edgeR_3.24.3                limma_3.36.5               

loaded via a namespace (and not attached):
  [1] Rtsne_0.15             colorspace_1.4-0       ggridges_0.5.1         rprojroot_1.3-2        XVector_0.22.0         fs_1.2.6               proxy_0.4-22          
  [8] rstudioapi_0.9.0       liger_1.1              listenv_0.7.0          npsurv_0.4-0           remotes_2.0.2          ggrepel_0.8.1          codetools_0.2-16      
 [15] splines_3.5.2          R.methodsS3_1.7.1      lsei_1.2-0             pkgload_1.0.2          jsonlite_1.6           umap_0.2.2.0           ica_1.0-2             
 [22] cluster_2.0.7-1        png_0.1-7              R.oo_1.22.0            sctransform_0.2.0      compiler_3.5.2         httr_1.4.0             backports_1.1.3       
 [29] assertthat_0.2.1       Matrix_1.2-15          lazyeval_0.2.1         cli_1.0.1              prettyunits_1.0.2      htmltools_0.3.6        tools_3.5.2           
 [36] bindrcpp_0.2.2         rsvd_1.0.0             igraph_1.2.2           gtable_0.2.0           glue_1.3.0             GenomeInfoDbData_1.2.0 RANN_2.6.1            
 [43] reshape2_1.4.3         dplyr_0.7.8            Rcpp_1.0.1             gdata_2.18.0           ape_5.2                nlme_3.1-137           gbRd_0.4-11           
 [50] lmtest_0.9-36          stringr_1.3.1          ps_1.3.0               globals_0.12.4         irlba_2.3.2            gtools_3.8.1           devtools_2.0.2        
 [57] NNLM_0.4.2             future_1.13.0          MASS_7.3-51.1          zlibbioc_1.28.0        zoo_1.8-4              scales_1.0.0           RColorBrewer_1.1-2    
 [64] curl_3.3               memoise_1.1.0          reticulate_1.10        pbapply_1.3-4          gridExtra_2.3          ggplot2_3.1.0          reshape_0.8.8         
 [71] stringi_1.2.4          desc_1.2.0             caTools_1.17.1.1       pkgbuild_1.0.2         bibtex_0.4.2           Rdpack_0.10-1          SDMTools_1.1-221      
 [78] rlang_0.3.1            pkgconfig_2.0.2        bitops_1.0-6           lattice_0.20-38        ROCR_1.0-7             purrr_0.3.0            bindr_0.1.1           
 [85] labeling_0.3           htmlwidgets_1.3        processx_3.2.1         cowplot_0.9.4          tidyselect_0.2.5       plyr_1.8.4             magrittr_1.5          
 [92] R6_2.3.0               snow_0.4-3             gplots_3.0.1.1         mgcv_1.8-26            withr_2.1.2            pillar_1.3.1           fitdistrplus_1.0-14   
 [99] survival_2.43-3        RCurl_1.95-4.12        tibble_2.0.1           future.apply_1.2.0     tsne_0.1-3             crayon_1.3.4           KernSmooth_2.23-15    
[106] plotly_4.8.0           usethis_1.4.0          locfit_1.5-9.1         grid_3.5.2             data.table_1.12.0      FNN_1.1.3              callr_3.1.1           
[113] metap_1.0              digest_0.6.18          tidyr_0.8.2            R.utils_2.7.0          usedist_0.1.0          munsell_0.5.0          viridisLite_0.3.0     
[120] sessioninfo_1.1.1

yanwu2014 commented 5 years ago

The package versions all look the same as what I have so that's not the problem. Would it be possible to see your code (I know it should be the same as the example)?

Also, can you check if the pagoda2 example works for you?

Btw thanks for your patience! I've never seen this before so it's taking me some time to figure it out.

JoannaTan commented 5 years ago

Hi,

No worries. I would really like to use swne for my work.

Please find below the codes

obj <- readRDS("pbmc3k_final.RObj")
var.genes <- VariableFeatures(obj)
length(var.genes)
cell.clusters <- Idents(obj)
levels(cell.clusters)
genes.embed <- c("MS4A1", "GNLY", "CD3E", "CD14","FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A")
swne.embedding <- RunSWNE(obj, k = 16, genes.embed = genes.embed)
PlotSWNE(swne.embedding, alpha.plot = 0.4, sample.groups = cell.clusters,do.label = T, label.size = 3.5, pt.size = 1, show.legend = F, seed = 42)

I tried to follow the pagoda2 example and it did not work too. swne_pagoda2_20190606.pdf I am still getting this warning message

Warning message:
In system.time(out <- .Call("NNLM_nnmf", A, as.integer(k), init.mask$Wi,  :
  Target tolerance not reached. Try a larger max.iter.

Please find below the codes which I used for the pagoda2 example

library(swne)
library(pagoda2)
library(Matrix)
p2 <-readRDS("pbmc3k_pagoda2.Robj")
n.od.genes <- 1.5e3
var.info <- p2$misc$varinfo; var.info <- var.info[order(var.info$lp, decreasing = F),];
od.genes <- rownames(var.info[1:n.od.genes,])
length(od.genes)
clusters <- p2$clusters$PCA$multilevel
levels(clusters)
genes.embed <- c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A")
swne.embedding <- RunSWNE(p2, k = 14, var.genes = od.genes, genes.embed = genes.embed)
PlotSWNE(swne.embedding, alpha.plot = 0.4, sample.groups = clusters,
do.label = T, label.size = 3.5, pt.size = 1.5, show.legend = F,seed = 42)

yanwu2014 commented 5 years ago

Hmm it looks like the error is with the NMF. Do you mind running through the SWNE pipeline step by step with the Seurat tutorial? If you do, when you run RunNMF do you get the same warning?

JoannaTan commented 5 years ago

I ran through the Seurat tutorial step by step. When I ran through the step-by-step tutorial, I managed to get the plot. Swne_plot_stepBystep.pdf

However, I get warning messages when I run FindNumFactors, and my plot looks different from the tutorial find_num_of_factors_plot_swne.pdf

k.err <- FindNumFactors(norm.counts[var.genes,], k.range = k.range, n.cores = 8, do.plot = T)
Warning messages:
1: In system.time(out <- .Call("NNLM_nnmf", A, as.integer(k), init.mask$Wi,  :
  Target tolerance not reached. Try a larger max.iter.
2: In system.time(out <- .Call("NNLM_nnmf", A, as.integer(k), init.mask$Wi,  :
  Target tolerance not reached. Try a larger max.iter.
3: In system.time(out <- .Call("NNLM_nnmf", A, as.integer(k), init.mask$Wi,  :
  Target tolerance not reached. Try a larger max.iter.
4: In system.time(out <- .Call("NNLM_nnmf", A, as.integer(k), init.mask$Wi,  :
  Target tolerance not reached. Try a larger max.iter.
5: In system.time(out <- .Call("NNLM_nnmf", A, as.integer(k), init.mask$Wi,  :
  Target tolerance not reached. Try a larger max.iter.
6: In system.time(out <- .Call("NNLM_nnmf", A, as.integer(k), init.mask$Wi,  :
  Target tolerance not reached. Try a larger max.iter.
7: In system.time(out <- .Call("NNLM_nnmf", A, as.integer(k), init.mask$Wi,  :
  Target tolerance not reached. Try a larger max.iter.

No warning messages were seen when I ran runNMF

k <- 16
nmf.res <- RunNMF(norm.counts[var.genes,], k = k)
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|

Looks like, I can only get the plot if I run through the process step-by-step.

yanwu2014 commented 5 years ago

Ah good to hear the step by step tutorial is working.

So I pushed an update to RunSWNE with an additional parameter ica.fast. Can you try setting ica.fast = F in RunSWNE and see if that helps? If not, do you mind just using SWNE step by step?

Best, Yan

JoannaTan commented 5 years ago

Hi @yanwu2014 ,

I tried running RunSWNE with the parameter that you suggested but it still does not work.

swne.embedding <- RunSWNE(obj, k = 16, genes.embed = genes.embed, ica.fast=F)
[1] "2000 variable genes to use"
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Initial stress        : 0.19693
stress after  10 iters: 0.06764, magic = 0.461
stress after  20 iters: 0.06091, magic = 0.500
stress after  30 iters: 0.05964, magic = 0.500
stress after  40 iters: 0.05945, magic = 0.500
stress after  50 iters: 0.05944, magic = 0.500
Warning messages:
1: In NNLM::nnlm(t(H), t(newdata), alpha = alpha, loss = loss, n.threads = n.cores) :
  x does not have a full column rank. Solution may not be unique.
2: In if (return.format == "embedding") { :
  the condition has length > 1 and only the first element will be used

I looked at the plot that I got through the step by step, I noticed that although I get different clusters, but the genes that are associated with each cluster is different from the one in the tutorial.

yanwu2014 commented 5 years ago

Ah yes so the NMF is semi-random, and will result in the "factors" having different genes associated with them for different runs is fairly normal. The plot will look a little different run to run as well for the same reason.

Do you think the step by step SWNE will work for your purposes?

JoannaTan commented 5 years ago

Hi @yanwu2014

Yes, the step by step SWNE will work for my case. I want to use Swne to cluster single cells.

Glad to know that the plot will look slightly different for each run.

Thank you very much for your help.

yanwu2014 commented 5 years ago

Great let me know if you have any other questions!

yanwu2014 / swne

Unable to replicate results for SWNE Walkthrough using Seurat #21