Error in scPagwas_main()

Echoloria commented 1 year ago

Hello, Thanks for developing such a powerful tool, I met some errors when I tried to analyze my own gwas_summary data. Here are the errors: Filtering out SNPs with MAF criterion done!

------ Thu Nov 23 15:36:57 2023 ------## *** 4th: SnpToGene start!!

Error in scPagwas_main(Pagwas = NULL, gwas_data = my_prune_gwas_data, : promise already under evaluation: recursive default argument reference or earlier problems?

Here is my code: Pagwas_test<-scPagwas_main(Pagwas = NULL, gwas_data =my_prune_gwas_data, Single_data =DLBCL_Bcells, output.prefix="test", # the prefix name for output files output.dirs="scPagwastry_output", assay="RNA", Pathway_list=Genes_by_pathway_kegg, n.cores=1, iters_singlecell = 10, chrom_ld = chrom_ld, singlecell=T,celltype=T )

My gwas summary data are like this: chrom pos REF ALT rsid beta se maf 1 1 705882 G A rs72631875 0.00734300 0.0369251 0.06 2 1 706368 A G rs55727773 -0.02314540 0.0195527 0.45

And the Single_data is the demo data you have shared in issue #17 --DLBCL_Bcells(https://pan.baidu.com/s/1Z5SIQub38bGVXjeTJbQZ2g?pwd=1234).

So, is there anything wrong with my data or code?

dengchunyu commented 1 year ago

Your GWAS file has an issue with the first column data, possibly due to the row labels being outputted as the first column, resulting in an error.

Echoloria commented 1 year ago

谢谢及时回复，这是我的summary_data和demo_data的截图，应该不能把第一列（染色体号）读成行名吧...

dengchunyu commented 1 year ago

你的代码在4th: SnpToGene这一步报错，就是gwas数据的问题，检查每一列格式对不对，beta等列需要numeric格式，从你提供的数据看，第一列是行名，但是如果读取不正确，很可能变成chrom列。从上面看你的gwas数据看不出什么问题，你是直接输入的data.frame还是txt文件？

Echoloria commented 1 year ago

这个问题解决啦，多谢及时回复。不过我在尝试使用超大矩阵（上百万个细胞的基因表达数据）时遇到另一个报错： *** 6th: Link_pathway_blocks_gwas function start! ****

Start to link gwas and pathway block annotations for 304 pathways! | | 0%Creating directory "./testhh/temp" which didn't exist.. |==================== | 29% Error in pa_block$ld_matrix_squared %% x2 : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102 Calls: scPagwas_main ... Pathway_block_func -> get_Pathway_sclm -> as -> .class1 -> %% -> %*% In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted 请问在使用scPagwas_main()进行一站式分析时该如何解决这种稀疏矩阵非常巨大引起的报错呢？

dengchunyu commented 1 year ago

具体解决办法是拆分数据，我在vignette中详细写明了操作步骤：https://dengchunyu.github.io/flexibleuse/2023/05/30/Strategies-for-Large-scale-Single-cell-Data-Subsetting-and-Computation.html 你的报错在第六步，说明前面的步骤内存是足够的，选择Solution 1。Solution 2是应对前面单细胞数据都无法读取的情况。

需要注意的是，后面进行Corr_Random这一步要根据自己的细胞数量调整参数，Nrandom，Nselect都是再可以承受的时间和内存下，越大越好。如果是一百万的细胞，建议Nselect选择20万以上，相对应的随机次数Nrandom在10以上。

计算细胞pvalue的部分Get_CorrectBg_p由于细胞量级太大可能会非常耗费时间。如果你生信基础很好，同时还掌握scDRS，可以将PCC基因相关性数据输出到文件中，将单细胞数据转换为scannpy格式，将pcc排序高的遗传基因作为scDRS的输入基因计算每个细胞的TRS得分以及pvalue. scDRS计算pbalue的过程和我们的计算pvalue的过程比较类似，都是背景矫正的方式。但是scDRS的python环境能够加快计算过程。

如果不能很快掌握scDRS方法，将Get_CorrectBg_p的iters_singlecell参数降低为50，n_topgenes降低为100，耐心等待计算完成。

Echoloria commented 1 year ago

多谢及时回复，接下来我会拆分数据试试看。另外，请问按照分步法计算得到的Pagwas为list，而一步法得到的Pagwas为seurat对象，然而可视化时利用scPagwas_Visualization()等函数需要输入的是seurat对象而非list，如何将分步法得到的Pagwas转化为seurat对象呢？

dengchunyu commented 1 year ago

把得分赋值给输入的seruat格式数据，利用seruat内置函数画图是一样的：FeaturePlot(scdata, features = 'scPagwas.TRS.Score1')

Echoloria commented 1 year ago

有一项可视化是heritability_cor_scatterplot(gene_heri_cor=Pagwas@misc$PCC），请问该如何对Seurat对象增加二级结构@misc，并将分步法得到的list中的数据赋值过去呢？

dengchunyu commented 1 year ago

不需要赋值，直接利用PCC结果就可以画图

Echoloria commented 1 year ago

好的，感谢。我在尝试用性能更高的服务器和Seurat5重新进行一步法分析，然而又出现了2个新的报错，第一个是：最后跑到step9时就突然停止了，log信息如下：

Get scPgwas score for each single cell done!
------ Thu Nov 30 21:49:33 2023 ------## *** 9th: scGet_PCC function start! ****

最后只生成了一个file: celltypes_bootstrap_results.csv，而没有如图的其他file 当我在同一台设备上重新再跑相同的代码和数据，在step2就报错了，没有重现上面的step9，报错如下：

dengchunyu commented 1 year ago

当细胞超过百万，R语言本身的限制就会出现各种各样的问题，尤其是后面的PCC分析需要将稀疏矩阵转换为正常矩阵，所以大规模数据都是要拆分的，对输出的结果文件进行个性化分析。第二个错误可能是R环境本身除了点问题，可以尝试重新安装Rcpp或者在网上找教程解决

dengchunyu commented 6 months ago

第一种，建议你把单细胞downsample一下，减少单细胞数量，这样避免内存问题，第二种，把数据随机分成几份进行计算最后合并到一起，这样比较麻烦，详细步骤见 https://dengchunyu.github.io/flexibleuse/2023/05/30/Strategies-for-Large-scale-Single-cell-Data-Subsetting-and-Computation.html

baibing0211 @.***> 于2024年5月15日周三 10:40写道：

这个问题解决啦，多谢及时回复。不过我在尝试使用超大矩阵（上百万个细胞的基因表达数据）时遇到另一个报错： *** 6th： Link_pathway_blocks_gwas function start！****

开始链接 304 条通路的 gwas 和通路块注释！ | |0%创建不存在的目录“./testhh/temp”。 |==================== |pa_block$ld_matrix_squared 中的 29% 错误 %% x2 ： Cholmod 错误“问题太大”在文件 ../Core/cholmod_dense.c，第 102 行呼叫：scPagwas_main ...Pathway_block_func -> get_Pathway_sclm -> as -> .class1 -> %% -> %*% 此外：有 50 个或更多警告（使用 warnings（）查看前 50 个警告）执行已停止请问在使用scPagwas_main（）进行一站式分析时，该如何解决这种稀疏矩阵非常巨大引起的报错呢？

Hello, I found a similar problem to yours, I would like to ask you how to solve it image.png (view on web) https://github.com/sulab-wmu/scPagwas/assets/141980428/113a7269-8ef4-4cf7-bcf8-72936079e887 My gwas summary statistics are like this: image.png (view on web) https://github.com/sulab-wmu/scPagwas/assets/141980428/654f2509-d5e2-4d61-b0a8-402e20a9380d

— Reply to this email directly, view it on GitHub https://github.com/sulab-wmu/scPagwas/issues/18#issuecomment-2111474903, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILWCUGGSBNPCQ65NM2ROJTZCLDKHAVCNFSM6AAAAAA7XLAR6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGQ3TIOJQGM . You are receiving this because you commented.Message ID: @.***>

sulab-wmu / scPagwas