yuabrahamliu / CWGCNA

CWGCNA is an R package to perform causal inference from the WGCNA framework.
10 stars 1 forks source link

Error in { : task 1 failed - "argument is of length zero" #4

Closed AcetylCoALab closed 1 month ago

AcetylCoALab commented 1 month ago

Error reported when running diffwgcna when using bulk transcriptome sequencing: cwgcnares <- diffwgcna(dat = betas, #表达矩阵

Start limma analysis on WGCNA modules NULL Start limma analysis within WGCNA modules and mediation analysis Error in { : task 1 failed - "argument is of length zero" 此外: Warning messages: 1: executing %dopar% sequentially: no parallel backend registered 2: The labeller API has been updated. Labellers taking variable and value arguments are now deprecated. ℹ See labellers documentation.

head(betas)[1:6,1:6] PRHG101 PRHG102 PRHG103 PRHG104 PRHG105 PRHG106 MT-CO1 13.48731 13.60386 14.34240 13.70454 13.17338 13.33154 MT-CO3 13.29708 13.19779 14.08867 13.18033 12.50957 12.53829 MT-CO2 12.65164 12.92307 13.44230 12.61492 12.06816 12.18130 MT-ATP6 12.54527 12.48942 13.16574 12.22180 11.60672 11.76723 MT-ND3 12.91606 12.42010 13.32140 12.30124 11.53982 11.47828 MT-ND4 12.04265 12.23196 12.91033 11.70367 11.36492 11.30693

head(pds) sampleid Group Gender Age 1 PRHG101 PRHG M 11482.770 2 PRHG102 PRHG M 1994.401 3 PRHG103 PRHG M 10064.173 4 PRHG104 PRHG M 4217.889 5 PRHG105 PRHG M 3927.761 6 PRHG106 PRHG F 6433.615

yuabrahamliu commented 1 month ago

Maybe you can try tuning several parameters:

balanceadj: Please change it to FALSE.

threads: Please change it to 1 to see whether the error will disappear without parallelization.

If the issue persists, would it be possible for you to share a small dataset with the errors? Rest assured, any sensitive information about your project should be concealed.

AcetylCoALab commented 1 month ago

I have retried as you suggested and unfortunately our problem persists, I will upload my data and run reports in the hope that you will be able to reproduce and fix the problem. Thank you very much! I think this software is very interesting. Thank you for your efforts!

exampledata.zip

VitCCC commented 1 month ago

Error reported when running diffwgcna when using bulk transcriptome sequencing: “task 1 failed - 'subscript out of bounds'”

VitCCC commented 1 month ago

I think your work is excellent! However, could it be made more understandable? For instance, many people conduct transcriptome analysis, which typically involves analyzing the relationship between genes and phenotypes. I hope you can continue to release more detailed and practical versions. Thanks!

VitCCC commented 1 month ago

This command will take ~ 25 min

cwgcnares <- diffwgcna(dat = exp_meta,

  • pddat = clinical,
  • responsevarname = "gleason_score",
  • confoundings = c("preop_psa") ,
  • featuretype = "gene", #由于已经将 DNAm 探针名称转换为基因名称,所以将参数featuretype设为“gene”
  • topvaricancetype = "sd",
  • topvaricance = 1000,#意味着将在数据中最具变异性的前5000个基因上进行分析(标准偏差最高的那些)
  • powers = seq(1, 20, 1), #这样可以通过网格搜索从1到20选择WGCNA的最佳软阈值
  • rsqcutline = 0.8, #目标最小无标度拓扑拟合指数是0.8。看不懂
  • mediation = TRUE, #决定在WGCNA流程中是否需要执行因果推断的核心——中介分析
  • topn = NULL, #因果推断仅针对最具差异性的1个WGCNA模块及其特征执行 默认情况下,此参数为 NULL,在这种情况下,所有差异模块都会进行因果推断,但分析所有模块会耗费时间。
  • plot = TRUE, #将生成若干图表来展示结果
  • titleprefix = "Placenta",
  • labelnum = NULL, #设置为 NULL,使没有特征名称被标记
  • titlesize = 17,
  • textsize = 16,
  • annotextsize = 5,
  • pvalcolname = "adj.P.Val",
  • pvalcutoff = 0.05,
  • isbetaval = TRUE,
  • absxcutoff = 0, #定义模块内基因的 log2FC 阈值,以选择差异基因
  • diffcutoff = 0, #义模块特征基因差异的阈值,以选择差异模块
  • threads = 1,
  • balanceadj=FALSE) pickSoftThreshold: will use block size 1000. pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 1000 of 1000 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 1 0.261 0.676 0.927 159.0000 1.58e+02 268.00 2 2 0.550 -0.667 0.939 44.4000 3.88e+01 118.00 3 3 0.822 -1.190 0.955 16.9000 1.19e+01 69.60 4 4 0.886 -1.400 0.955 7.8400 4.23e+00 46.80 5 5 0.919 -1.450 0.974 4.1800 1.66e+00 34.00 6 6 0.912 -1.440 0.945 2.4600 6.81e-01 25.90 7 7 0.935 -1.440 0.977 1.5500 3.03e-01 20.40 8 8 0.926 -1.410 0.946 1.0400 1.43e-01 16.40 9 9 0.947 -1.400 0.969 0.7220 6.95e-02 13.50 10 10 0.956 -1.360 0.979 0.5190 3.45e-02 11.20 11 11 0.966 -1.340 0.990 0.3830 1.74e-02 9.45 12 12 0.963 -1.320 0.975 0.2900 8.86e-03 8.05 13 13 0.959 -1.350 0.970 0.2230 4.60e-03 6.92 14 14 0.942 -1.360 0.950 0.1740 2.37e-03 5.99 15 15 0.953 -1.330 0.958 0.1380 1.28e-03 5.22 16 16 0.954 -1.310 0.954 0.1110 7.07e-04 4.57 17 17 0.968 -1.300 0.965 0.0900 4.07e-04 4.03 18 18 0.970 -1.290 0.967 0.0738 2.19e-04 3.57 19 19 0.943 -1.300 0.929 0.0610 1.19e-04 3.17 20 20 0.967 -1.300 0.961 0.0508 6.41e-05 2.83 Calculating module eigengenes block-wise from all genes Flagging genes and samples with too many missing values... ..step 1 ..Working on block 1 . TOM calculation: adjacency.. ..will not use multithreading. Fraction of slow calculations: 0.000000 ..connectivity.. ..matrix multiplication (system BLAS).. ..normalization.. ..done. ..saving TOM for block 1 into file blockwiseTOM.2024-05-14_21-54-25.216837-block.1.RData ....clustering.. ....detecting modules.. ....calculating module eigengenes.. ....checking kME in modules.. ..removing 84 genes from module 1 because their KME is too low. ..removing 91 genes from module 2 because their KME is too low. ..removing 77 genes from module 3 because their KME is too low. ..removing 6 genes from module 4 because their KME is too low. ..merging modules that are too close.. mergeCloseModules: Merging modules whose distance is less than 0.2 Calculating new MEs... $mar [1] 1 5 0 1

Start limma analysis on WGCNA modules NULL Start limma analysis within WGCNA modules and mediation analysis Error in singlemediation(j = j, mediatordat = mediatordat, predictdat = predictdat, : 找不到对象'bootnum' 此外: Warning message: The labeller API has been updated. Labellers taking variable and value arguments are now deprecated. ℹ See labellers documentation.

VitCCC commented 1 month ago

Here are three questions I would like to ask you:

Can this be used for bulk transcriptome data? I only want to use a few hundred genes from a specific pathway. Is this feasible? If I turn off the "mediation" function, it still runs. Why is that?

AcetylCoALab commented 1 month ago

This command will take ~ 25 min

cwgcnares <- diffwgcna(dat = exp_meta,

  •                  pddat = clinical, 
  •                  responsevarname = "gleason_score",
  •                  confoundings = c("preop_psa") ,
  •                  featuretype = "gene", #由于已经将 DNAm 探针名称转换为基因名称,所以将参数featuretype设为“gene”
  •                  topvaricancetype = "sd", 
  •                  topvaricance = 1000,#意味着将在数据中最具变异性的前5000个基因上进行分析(标准偏差最高的那些)
  •                  powers = seq(1, 20, 1), #这样可以通过网格搜索从1到20选择WGCNA的最佳软阈值
  •                  rsqcutline = 0.8, #目标最小无标度拓扑拟合指数是0.8。看不懂
  •                  mediation = TRUE, #决定在WGCNA流程中是否需要执行因果推断的核心——中介分析
  •                  topn = NULL, #因果推断仅针对最具差异性的1个WGCNA模块及其特征执行 默认情况下,此参数为 NULL,在这种情况下,所有差异模块都会进行因果推断,但分析所有模块会耗费时间。
  •                  plot = TRUE, #将生成若干图表来展示结果
  •                  titleprefix = "Placenta", 
  •                  labelnum = NULL, #设置为 NULL,使没有特征名称被标记
  •                  titlesize = 17, 
  •                  textsize = 16, 
  •                  annotextsize = 5, 
  •                  pvalcolname = "adj.P.Val", 
  •                  pvalcutoff = 0.05, 
  •                  isbetaval = TRUE, 
  •                  absxcutoff = 0, #定义模块内基因的 log2FC 阈值,以选择差异基因
  •                  diffcutoff = 0, #义模块特征基因差异的阈值,以选择差异模块
  •                  threads = 1,
  •                  balanceadj=FALSE)

pickSoftThreshold: will use block size 1000. pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 1000 of 1000 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 1 0.261 0.676 0.927 159.0000 1.58e+02 268.00 2 2 0.550 -0.667 0.939 44.4000 3.88e+01 118.00 3 3 0.822 -1.190 0.955 16.9000 1.19e+01 69.60 4 4 0.886 -1.400 0.955 7.8400 4.23e+00 46.80 5 5 0.919 -1.450 0.974 4.1800 1.66e+00 34.00 6 6 0.912 -1.440 0.945 2.4600 6.81e-01 25.90 7 7 0.935 -1.440 0.977 1.5500 3.03e-01 20.40 8 8 0.926 -1.410 0.946 1.0400 1.43e-01 16.40 9 9 0.947 -1.400 0.969 0.7220 6.95e-02 13.50 10 10 0.956 -1.360 0.979 0.5190 3.45e-02 11.20 11 11 0.966 -1.340 0.990 0.3830 1.74e-02 9.45 12 12 0.963 -1.320 0.975 0.2900 8.86e-03 8.05 13 13 0.959 -1.350 0.970 0.2230 4.60e-03 6.92 14 14 0.942 -1.360 0.950 0.1740 2.37e-03 5.99 15 15 0.953 -1.330 0.958 0.1380 1.28e-03 5.22 16 16 0.954 -1.310 0.954 0.1110 7.07e-04 4.57 17 17 0.968 -1.300 0.965 0.0900 4.07e-04 4.03 18 18 0.970 -1.290 0.967 0.0738 2.19e-04 3.57 19 19 0.943 -1.300 0.929 0.0610 1.19e-04 3.17 20 20 0.967 -1.300 0.961 0.0508 6.41e-05 2.83 Calculating module eigengenes block-wise from all genes Flagging genes and samples with too many missing values... ..step 1 ..Working on block 1 . TOM calculation: adjacency.. ..will not use multithreading. Fraction of slow calculations: 0.000000 ..connectivity.. ..matrix multiplication (system BLAS).. ..normalization.. ..done. ..saving TOM for block 1 into file blockwiseTOM.2024-05-14_21-54-25.216837-block.1.RData ....clustering.. ....detecting modules.. ....calculating module eigengenes.. ....checking kME in modules.. ..removing 84 genes from module 1 because their KME is too low. ..removing 91 genes from module 2 because their KME is too low. ..removing 77 genes from module 3 because their KME is too low. ..removing 6 genes from module 4 because their KME is too low. ..merging modules that are too close.. mergeCloseModules: Merging modules whose distance is less than 0.2 Calculating new MEs... $mar [1] 1 5 0 1

Start limma analysis on WGCNA modules NULL Start limma analysis within WGCNA modules and mediation analysis Error in singlemediation(j = j, mediatordat = mediatordat, predictdat = predictdat, : 找不到对象'bootnum' 此外: Warning message: The labeller API has been updated. Labellers taking variable and value arguments are now deprecated. ℹ See labellers documentation.

Our error messages are consistent, but running the example methylation data did not result in an error. It is therefore unclear whether this approach can be applied to transcriptomic and other types of data.

yuabrahamliu commented 1 month ago

I have updated the package. You can reinstall it and have a try. Please begin from the settings balanceadj = FALSE and threads = 1 to see whether it works this time.

VitCCC commented 1 month ago

okay! thank you for your work

AcetylCoALab commented 1 month ago

似乎出现了新的错误> #This command will take ~ 25 min

cwgcnares <- diffwgcna(dat = betas,

  • pddat = pds,
  • responsevarname = "Subtype",
  • confoundings = c('stage'),
  • featuretype = "gene",
  • balanceadj = F,
  • topvaricancetype = "sd",
  • topvaricance = 5000,
  • powers = seq(1, 20, 1),
  • rsqcutline = 0.8,
  • mediation = TRUE,
  • topn = 1,
  • plot = TRUE,
  • titleprefix = "Placenta",
  • labelnum = 5,
  • titlesize = 17,
  • textsize = 16,
  • annotextsize = 5,
  • pvalcolname = "adj.P.Val",
  • pvalcutoff = 0.05,
  • isbetaval = TRUE,
  • absxcutoff = 0,
  • diffcutoff = 0,
  • threads = 1) pickSoftThreshold: will use block size 5000. pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 5000 of 5000 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 1 0.0826 0.677 0.983 727.000 7.02e+02 1340.0 2 2 0.4880 -1.080 0.935 186.000 1.58e+02 577.0 3 3 0.8510 -1.480 0.959 67.100 4.60e+01 314.0 4 4 0.8490 -1.560 0.904 30.700 1.54e+01 198.0 5 5 0.8170 -1.480 0.851 16.600 5.81e+00 135.0 6 6 0.8410 -1.320 0.840 10.100 2.35e+00 95.8 7 7 0.8970 -1.240 0.903 6.690 1.04e+00 75.3 8 8 0.8680 -1.280 0.911 4.680 4.85e-01 65.3 9 9 0.8730 -1.300 0.935 3.410 2.41e-01 57.1 10 10 0.8750 -1.310 0.955 2.570 1.19e-01 50.3 11 11 0.8900 -1.310 0.970 1.980 6.05e-02 44.6 12 12 0.9000 -1.310 0.973 1.550 3.17e-02 39.7 13 13 0.8940 -1.330 0.970 1.240 1.70e-02 35.4 14 14 0.9020 -1.340 0.971 0.998 9.17e-03 31.8 15 15 0.9150 -1.350 0.977 0.814 4.91e-03 28.5 16 16 0.9240 -1.350 0.980 0.670 2.73e-03 25.7 17 17 0.9010 -1.380 0.965 0.557 1.54e-03 23.2 18 18 0.9120 -1.390 0.969 0.465 8.89e-04 21.0 19 19 0.9210 -1.400 0.969 0.392 5.07e-04 19.1 20 20 0.9260 -1.410 0.969 0.331 2.96e-04 17.3 Calculating module eigengenes block-wise from all genes Flagging genes and samples with too many missing values... ..step 1 ..Working on block 1 . TOM calculation: adjacency.. ..will not use multithreading. Fraction of slow calculations: 0.000000 ..connectivity.. ..matrix multiplication (system BLAS).. ..normalization.. ..done. ..saving TOM for block 1 into file blockwiseTOM.2024-05-14_22-32-53.888229-block.1.RData ....clustering.. ....detecting modules.. ....calculating module eigengenes.. ....checking kME in modules.. ..removing 364 genes from module 1 because their KME is too low. ..removing 103 genes from module 2 because their KME is too low. ..removing 123 genes from module 3 because their KME is too low. ..removing 216 genes from module 4 because their KME is too low. ..removing 10 genes from module 5 because their KME is too low. ..removing 6 genes from module 6 because their KME is too low. ..removing 19 genes from module 7 because their KME is too low. ..removing 4 genes from module 8 because their KME is too low. ..removing 3 genes from module 9 because their KME is too low. ..removing 1 genes from module 11 because their KME is too low. ..removing 8 genes from module 12 because their KME is too low. ..removing 3 genes from module 14 because their KME is too low. ..merging modules that are too close.. mergeCloseModules: Merging modules whose distance is less than 0.2 Calculating new MEs... $mar [1] 1 5 0 1

Start limma analysis on WGCNA modules NULL Start limma analysis within WGCNA modules and mediation analysis Error in mediationmod(mediatordat = mediatordat, predictdat = otherdat, : lazy-load database 'C:/Users/92336/AppData/Local/R/win-library/4.3/CWGCNA/R/CWGCNA.rdb' is corrupt In addition: Warning messages: 1: The labeller API has been updated. Labellers taking variable and value arguments are now deprecated. ℹ See labellers documentation. 2: In mediationmod(mediatordat = mediatordat, predictdat = otherdat, : internal error -3 in R_decompress1

123abclxy commented 1 month ago

This is the issue I encountered after reinstalling ![Uploading Error.png…]()

AcetylCoALab commented 1 month ago

I have good news, after I restarted Rstudio, the code is running fine. Interestingly, I put balanceadj = T and threads = 15 and still managed to complete the run without any problems. Thanks to the author for his effort and hard work!

123abclxy commented 1 month ago

I have good news, after I restarted Rstudio, the code is running fine. Interestingly, I put balanceadj = T and threads = 15 and still managed to complete the run without any problems. Thanks to the author for his effort and hard work!

It seems there's still an issue. Don't you believe you should try a different dataset? I think it's still related to the R package.

yuabrahamliu commented 1 month ago

Here are three questions I would like to ask you:

Can this be used for bulk transcriptome data? I only want to use a few hundred genes from a specific pathway. Is this feasible? If I turn off the "mediation" function, it still runs. Why is that?

Yes, this function is suitable for bulk transcriptomic data analysis.

If you only want to use a few hundred prescreened genes, you can create a subset of your original dataset only with those genes and then transfer it to the function.

The function diffwgcna will run the normal WGCNA + mediation analysis; if you turn off the parameter mediation, that analysis will not be performed, but the normal WGCNA will still be implemented.

yuabrahamliu commented 1 month ago

I have good news, after I restarted Rstudio, the code is running fine. Interestingly, I put balanceadj = T and threads = 15 and still managed to complete the run without any problems. Thanks to the author for his effort and hard work!

You can use the setting balanceadj = FALSE instead of TRUE because, from former simulation experiments, the TRUE mode took a longer time to run, but it did not show a significant advantage over the FALSE one, so FALSE is suggested.

yuabrahamliu commented 1 month ago

I have good news, after I restarted Rstudio, the code is running fine. Interestingly, I put balanceadj = T and threads = 15 and still managed to complete the run without any problems. Thanks to the author for his effort and hard work!

It seems there's still an issue. Don't you believe you should try a different dataset? I think it's still related to the R package.

I tried it on your example data from my side, and it worked. You can follow the codes below. If you encounter other problems with other new datasets, please report them.

extres <- diffwgcna(dat = extgene, 
                    pddat = extpds, 
                    responsevarname = "Subtype", 
                    confoundings = c("stage"), 
                    featuretype = "gene", 

                    topvaricancetype = "sd", 
                    topvaricance = 5000, 

                    powers = seq(1, 20, 1), 
                    rsqcutline = 0.8, 

                    mediation = TRUE, 

                    topn = 1, 

                    plot = TRUE, 
                    titleprefix = "TCGA", 
                    labelnum = 5, 
                    titlesize = 17, 
                    textsize = 16, 
                    annotextsize = 5, 

                    pvalcolname = "adj.P.Val", 
                    pvalcutoff = 0.05, 
                    isbetaval = FALSE, 
                    absxcutoff = 0, 
                    diffcutoff = 0, 

                    threads = 4, 

                    balanceadj = FALSE)
123abclxy commented 1 month ago

I have good news, after I restarted Rstudio, the code is running fine. Interestingly, I put balanceadj = T and threads = 15 and still managed to complete the run without any problems. Thanks to the author for his effort and hard work!

It seems there's still an issue. Don't you believe you should try a different dataset? I think it's still related to the R package.

I tried it on your example data from my side, and it worked. You can follow the codes below. If you encounter other problems with other new datasets, please report them.

extres <- diffwgcna(dat = extgene, 
                    pddat = extpds, 
                    responsevarname = "Subtype", 
                    confoundings = c("stage"), 
                    featuretype = "gene", 

                    topvaricancetype = "sd", 
                    topvaricance = 5000, 

                    powers = seq(1, 20, 1), 
                    rsqcutline = 0.8, 

                    mediation = TRUE, 

                    topn = 1, 

                    plot = TRUE, 
                    titleprefix = "TCGA", 
                    labelnum = 5, 
                    titlesize = 17, 
                    textsize = 16, 
                    annotextsize = 5, 

                    pvalcolname = "adj.P.Val", 
                    pvalcutoff = 0.05, 
                    isbetaval = FALSE, 
                    absxcutoff = 0, 
                    diffcutoff = 0, 

                    threads = 4, 

                    balanceadj = FALSE)

Thank you for your response, but there are no extraneous factors in my code, and my performance metric is 'Stage’. Based on the code you provided, after continuing with the execution, I encountered the same issue 'Start limma analysis within WGCNA modules and mediation analysis. Error in { : task 1 failed - "replacement has 0 rows, data has 159".' Could you please help us investigate what might be causing this? ![Uploading Error.png…]()

yuabrahamliu commented 1 month ago

Could you let me know if you're sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side, and your group member also mentioned above that the function worked. Or can you share your dataset again?

123abclxy commented 1 month ago

Are you sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side. Or can you share your dataset again?

Dear author, after considering confounding factors, I was able to run it successfully. Hence, I suggest whether it would be necessary to clarify that this package must consider confounding factors, as without doing so, the error I just encountered may occur again. Thank you once again for your contribution to the field of bioinformatics. My team and I applaud your work.

yuabrahamliu commented 1 month ago

Are you sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side. Or can you share your dataset again?

Dear author, after considering confounding factors, I was able to run it successfully. Hence, I suggest whether it would be necessary to clarify that this package must consider confounding factors, as without doing so, the error I just encountered may occur again. Thank you once again for your contribution to the field of bioinformatics. My team and I applaud your work.

I have updated the package so that you can set the parameter confoundings to NULL if you don't want to adjust confounding factors for the mediation analysis. However, it is suggested that you adjust them.

123abclxy commented 1 month ago

Are you sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side. Or can you share your dataset again?

Dear author, after considering confounding factors, I was able to run it successfully. Hence, I suggest whether it would be necessary to clarify that this package must consider confounding factors, as without doing so, the error I just encountered may occur again. Thank you once again for your contribution to the field of bioinformatics. My team and I applaud your work.

I have updated the package so that you can set the parameter confoundings to NULL if you don't want to adjust confounding factors for the mediation analysis. However, it is suggested that you adjust them.

Thank you for your attention to my previous question. Additionally, I have noticed another issue over the past few days. Specifically, when dealing with more than 5000 variant genes, an error occurs with your package. I am wondering if you are aware of this problem.

yuabrahamliu commented 1 month ago

Are you sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side. Or can you share your dataset again?

Dear author, after considering confounding factors, I was able to run it successfully. Hence, I suggest whether it would be necessary to clarify that this package must consider confounding factors, as without doing so, the error I just encountered may occur again. Thank you once again for your contribution to the field of bioinformatics. My team and I applaud your work.

I have updated the package so that you can set the parameter confoundings to NULL if you don't want to adjust confounding factors for the mediation analysis. However, it is suggested that you adjust them.

Thank you for your attention to my previous question. Additionally, I have noticed another issue over the past few days. Specifically, when dealing with more than 5000 variant genes, an error occurs with your package. I am wondering if you are aware of this problem.

How many genes did you use? Could you tell me your parameter settings?

123abclxy commented 1 month ago

Are you sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side. Or can you share your dataset again?

Dear author, after considering confounding factors, I was able to run it successfully. Hence, I suggest whether it would be necessary to clarify that this package must consider confounding factors, as without doing so, the error I just encountered may occur again. Thank you once again for your contribution to the field of bioinformatics. My team and I applaud your work.

I have updated the package so that you can set the parameter confoundings to NULL if you don't want to adjust confounding factors for the mediation analysis. However, it is suggested that you adjust them.

Thank you for your attention to my previous question. Additionally, I have noticed another issue over the past few days. Specifically, when dealing with more than 5000 variant genes, an error occurs with your package. I am wondering if you are aware of this problem.

How many genes did you use? Could you tell me your parameter settings?

About 20000.

yuabrahamliu commented 1 month ago

Are you sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side. Or can you share your dataset again?

Dear author, after considering confounding factors, I was able to run it successfully. Hence, I suggest whether it would be necessary to clarify that this package must consider confounding factors, as without doing so, the error I just encountered may occur again. Thank you once again for your contribution to the field of bioinformatics. My team and I applaud your work.

I have updated the package so that you can set the parameter confoundings to NULL if you don't want to adjust confounding factors for the mediation analysis. However, it is suggested that you adjust them.

Thank you for your attention to my previous question. Additionally, I have noticed another issue over the past few days. Specifically, when dealing with more than 5000 variant genes, an error occurs with your package. I am wondering if you are aware of this problem.

How many genes did you use? Could you tell me your parameter settings?

About 20000.

I have fixed the bug. You can run the package on your ~20000 genes now.

123abclxy commented 1 month ago

Are you sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side. Or can you share your dataset again?

Dear author, after considering confounding factors, I was able to run it successfully. Hence, I suggest whether it would be necessary to clarify that this package must consider confounding factors, as without doing so, the error I just encountered may occur again. Thank you once again for your contribution to the field of bioinformatics. My team and I applaud your work.

I have updated the package so that you can set the parameter confoundings to NULL if you don't want to adjust confounding factors for the mediation analysis. However, it is suggested that you adjust them.

Thank you for your attention to my previous question. Additionally, I have noticed another issue over the past few days. Specifically, when dealing with more than 5000 variant genes, an error occurs with your package. I am wondering if you are aware of this problem.

How many genes did you use? Could you tell me your parameter settings?

About 20000.

I have fixed the bug. You can run the package on your ~20000 genes now.

Thank you! I have noticed that this step runs slower compared to your R package before the recent update.

yuabrahamliu commented 1 month ago

Are you sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side. Or can you share your dataset again?

Dear author, after considering confounding factors, I was able to run it successfully. Hence, I suggest whether it would be necessary to clarify that this package must consider confounding factors, as without doing so, the error I just encountered may occur again. Thank you once again for your contribution to the field of bioinformatics. My team and I applaud your work.

I have updated the package so that you can set the parameter confoundings to NULL if you don't want to adjust confounding factors for the mediation analysis. However, it is suggested that you adjust them.

Thank you for your attention to my previous question. Additionally, I have noticed another issue over the past few days. Specifically, when dealing with more than 5000 variant genes, an error occurs with your package. I am wondering if you are aware of this problem.

How many genes did you use? Could you tell me your parameter settings?

About 20000.

I have fixed the bug. You can run the package on your ~20000 genes now.

Thank you! I have noticed that this step runs slower compared to your R package before the recent update.

The steps for the large gene number ~20000 will differ from the smaller 5000 one. The steps divide the genes into smaller blocks to calculate separately and then merge their results. This is to save the RAM to be consumed by the large gene number. There will be no division steps for a smaller gene number, such as 5000, so it will be quicker in that case.

123abclxy commented 1 month ago

Are you sure you re-installed the package after I updated it this morning? The new version worked on your dataset from my side. Or can you share your dataset again?

Dear author, after considering confounding factors, I was able to run it successfully. Hence, I suggest whether it would be necessary to clarify that this package must consider confounding factors, as without doing so, the error I just encountered may occur again. Thank you once again for your contribution to the field of bioinformatics. My team and I applaud your work.

I have updated the package so that you can set the parameter confoundings to NULL if you don't want to adjust confounding factors for the mediation analysis. However, it is suggested that you adjust them.

Thank you for your attention to my previous question. Additionally, I have noticed another issue over the past few days. Specifically, when dealing with more than 5000 variant genes, an error occurs with your package. I am wondering if you are aware of this problem.

How many genes did you use? Could you tell me your parameter settings?

About 20000.

I have fixed the bug. You can run the package on your ~20000 genes now.

Thank you! I have noticed that this step runs slower compared to your R package before the recent update.

The steps for the large gene number ~20000 will differ from the smaller 5000 one. The steps divide the genes into smaller blocks to calculate separately and then merge their results. This is to save the RAM to be consumed by the large gene number. There will be no division steps for a smaller gene number, such as 5000, so it will be quicker in that case.

Oh, I see. Thank you again. I'll try this later.

VitCCC commented 1 month ago

使用另一个函数diffwgcna进行因果WGCNA分析

Warning message: In doTryCatch(return(expr), name, parentenv, handler) : 显示串列没有完全被刷新

This command will take ~ 25 min

cwgcnares <- diffwgcna(dat = exp_meta, #参数dat接受 beta 值矩阵

  • pddat = clinical, #pddat 接受 pds 数据框
  • responsevarname = "gleason_score", #指定pds 中的哪一列是目标变量
  • confoundings = c("preop_psa","clinical_t_stage") ,#指定需要调整的混杂因素
  • featuretype = "gene", #由于已经将 DNAm 探针名称转换为基因名称,所以将参数featuretype设为“gene”
  • topvaricancetype = "sd",
  • topvaricance = 1000,#意味着将在数据中最具变异性的前5000个基因上进行分析(标准偏差最高的那些)
  • powers = seq(1, 20, 1), #这样可以通过网格搜索从1到20选择WGCNA的最佳软阈值
  • rsqcutline = 0.8, #目标最小无标度拓扑拟合指数是0.8。看不懂
  • mediation = TRUE, #决定在WGCNA流程中是否需要执行因果推断的核心——中介分析
  • topn = NULL, #因果推断仅针对最具差异性的1个WGCNA模块及其特征执行 默认情况下,此参数为 NULL,在这种情况下,所有差异模块都会进行因果推断,但分析所有模块会耗费时间。
  • plot = TRUE, #将生成若干图表来展示结果
  • titleprefix = "Placenta",
  • labelnum = NULL, #设置为 NULL,使没有特征名称被标记
  • titlesize = 17,
  • textsize = 16,
  • annotextsize = 5,
  • pvalcolname = "adj.P.Val",
  • pvalcutoff = 0.05,
  • isbetaval = TRUE,
  • absxcutoff = 0, #定义模块内基因的 log2FC 阈值,以选择差异基因
  • diffcutoff = 0, #义模块特征基因差异的阈值,以选择差异模块
  • threads = 6,
  • balanceadj=T) The ensemble mode only supports binary variable, but the current response is not. Hence, the mode has been changed to normal mode. pickSoftThreshold: will use block size 1000. pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 1000 of 1000 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 1 0.261 0.676 0.927 159.0000 1.58e+02 268.00 2 2 0.550 -0.667 0.939 44.4000 3.88e+01 118.00 3 3 0.822 -1.190 0.955 16.9000 1.19e+01 69.60 4 4 0.886 -1.400 0.955 7.8400 4.23e+00 46.80 5 5 0.919 -1.450 0.974 4.1800 1.66e+00 34.00 6 6 0.912 -1.440 0.945 2.4600 6.81e-01 25.90 7 7 0.935 -1.440 0.977 1.5500 3.03e-01 20.40 8 8 0.926 -1.410 0.946 1.0400 1.43e-01 16.40 9 9 0.947 -1.400 0.969 0.7220 6.95e-02 13.50 10 10 0.956 -1.360 0.979 0.5190 3.45e-02 11.20 11 11 0.966 -1.340 0.990 0.3830 1.74e-02 9.45 12 12 0.963 -1.320 0.975 0.2900 8.86e-03 8.05 13 13 0.959 -1.350 0.970 0.2230 4.60e-03 6.92 14 14 0.942 -1.360 0.950 0.1740 2.37e-03 5.99 15 15 0.953 -1.330 0.958 0.1380 1.28e-03 5.22 16 16 0.954 -1.310 0.954 0.1110 7.07e-04 4.57 17 17 0.968 -1.300 0.965 0.0900 4.07e-04 4.03 18 18 0.970 -1.290 0.967 0.0738 2.19e-04 3.57 19 19 0.943 -1.300 0.929 0.0610 1.19e-04 3.17 20 20 0.967 -1.300 0.961 0.0508 6.41e-05 2.83 Calculating module eigengenes block-wise from all genes Flagging genes and samples with too many missing values... ..step 1 ..Working on block 1 . TOM calculation: adjacency.. ..will not use multithreading. Fraction of slow calculations: 0.000000 ..connectivity.. ..matrix multiplication (system BLAS).. ..normalization.. ..done. ..saving TOM for block 1 into file blockwiseTOM.2024-05-17_17-28-21.997864-block.1.RData ....clustering.. ....detecting modules.. ....calculating module eigengenes.. ....checking kME in modules.. ..removing 84 genes from module 1 because their KME is too low. ..removing 91 genes from module 2 because their KME is too low. ..removing 77 genes from module 3 because their KME is too low. ..removing 6 genes from module 4 because their KME is too low. ..merging modules that are too close.. mergeCloseModules: Merging modules whose distance is less than 0.2 Calculating new MEs... $mar [1] 1 5 0 1

Start limma analysis on WGCNA modules NULL Start limma analysis within WGCNA modules and mediation analysis Error in { : task 1 failed - "the condition has length > 1" 此外: Warning messages: 1: In obj$state : 关闭不再使用的链结8(<-DESKTOP-9AMSBII:11540) 2: In obj$state : 关闭不再使用的链结7(<-DESKTOP-9AMSBII:11540) 3: In obj$state : 关闭不再使用的链结6(<-DESKTOP-9AMSBII:11540) 4: In obj$state : 关闭不再使用的链结5(<-DESKTOP-9AMSBII:11540) 5: In obj$state : 关闭不再使用的链结4(<-DESKTOP-9AMSBII:11540) 6: In obj$state : 关闭不再使用的链结3(<-DESKTOP-9AMSBII:11540) 7: executing %dopar% sequentially: no parallel backend registered 8: The labeller API has been updated. Labellers taking variable and value arguments are now deprecated. ℹ See labellers documentation.

VitCCC commented 1 month ago

Is it possible that with my topvaricance = 1000, the number of highly variable genes selected is too low?

VitCCC commented 1 month ago

I also don't know what threads = 6 means.

VitCCC commented 1 month ago

Error in { : task 1 failed - "the condition has length > 1"

123abclxy commented 1 month ago

Is it possible that with my topvaricance = 1000, the number of highly variable genes selected is too low?

It seems that the newly updated package no longer supports variants with fewer than 5000 mutations.

yuabrahamliu commented 1 month ago

I also don't know what threads = 6 means.

It means you will use 6 threads to compute in parallelization.

yuabrahamliu commented 1 month ago

Is it possible that with my topvaricance = 1000, the number of highly variable genes selected is too low?

I don't know from your current message. Can you share a small dataset with this error so that I can take a look?

VitCCC commented 1 month ago

[Uploading test.zip…]()

VitCCC commented 1 month ago

Sorry, I forgot to reply to your message. I have been delayed by many tasks recently. The compressed file contains the expression matrix and clinical information.

VitCCC commented 1 month ago
2. 混杂因素的分析

featuresampling函数

top10k <- featuresampling(betas = exp_meta, # 基因表达矩阵 topfeatures = 1000, # 仅选择前10000个最具变异性的DNAm探针构成的数据矩阵 variancetype = "sd", # 定义如何计算方差 threads = 6) # 并行化的线程数量

接下来,使用这个函数的$betas进行ANOVA分析:

This command will take < 1 min

anovares <- featuresampling(betas = top10k$betas, # 使用 betas pddat = clinical, # 性状数据 anova = TRUE, # 执行 ANOVA plotannovares = TRUE, # 绘制数据的ANOVA结果 featuretype = "probe",
plottitlesuffix = "placenta", # 显示在图表标题中的字符 titlesize = 18, textsize = 16, threads = 1)

cwgcnares <- diffwgcna(dat = exp_meta, #参数dat接受 beta 值矩阵 pddat = clinical, #pddat 接受 pds 数据框 responsevarname = "gleason_score", #指定pds 中的哪一列是目标变量 confoundings = c("preop_psa","clinical_t_stage") ,#指定需要调整的混杂因素 featuretype = "gene", #由于已经将 DNAm 探针名称转换为基因名称,所以将参数featuretype设为“gene” topvaricancetype = "sd", topvaricance = 1000,#意味着将在数据中最具变异性的前5000个基因上进行分析(标准偏差最高的那些) powers = seq(1, 20, 1), #这样可以通过网格搜索从1到20选择WGCNA的最佳软阈值 rsqcutline = 0.8, #目标最小无标度拓扑拟合指数是0.8。看不懂 mediation = TRUE, #决定在WGCNA流程中是否需要执行因果推断的核心——中介分析 topn = NULL, #因果推断仅针对最具差异性的1个WGCNA模块及其特征执行 默认情况下,此参数为 NULL,在这种情况下,所有差异模块都会进行因果推断,但分析所有模块会耗费时间。 plot = TRUE, #将生成若干图表来展示结果 titleprefix = "Placenta", labelnum = NULL, #设置为 NULL,使没有特征名称被标记 titlesize = 17, textsize = 16, annotextsize = 5, pvalcolname = "adj.P.Val", pvalcutoff = 0.05, isbetaval = TRUE, absxcutoff = 0, #定义模块内基因的 log2FC 阈值,以选择差异基因 diffcutoff = 0, #义模块特征基因差异的阈值,以选择差异模块 threads = 1, balanceadj=T)

输出结果

result <- cwgcnares[["melimmares"]]

123abclxy commented 1 month ago
2. 混杂因素的分析

featuresampling函数 top10k <- featuresampling(betas = exp_meta, # 基因表达矩阵 topfeatures = 1000, # 仅选择前10000个最具变异性的DNAm探针构成的数据矩阵 variancetype = "sd", # 定义如何计算方差 threads = 6) # 并行化的线程数量

接下来,使用这个函数的$betas进行ANOVA分析: #This command will take < 1 min anovares <- featuresampling(betas = top10k$betas, # 使用 betas pddat = clinical, # 性状数据 anova = TRUE, # 执行 ANOVA plotannovares = TRUE, # 绘制数据的ANOVA结果 featuretype = "probe", plottitlesuffix = "placenta", # 显示在图表标题中的字符 titlesize = 18, textsize = 16, threads = 1)

cwgcnares <- diffwgcna(dat = exp_meta, #参数dat接受 beta 值矩阵 pddat = clinical, #pddat 接受 pds 数据框 responsevarname = "gleason_score", #指定pds 中的哪一列是目标变量 confoundings = c("preop_psa","clinical_t_stage") ,#指定需要调整的混杂因素 featuretype = "gene", #由于已经将 DNAm 探针名称转换为基因名称,所以将参数featuretype设为“gene” topvaricancetype = "sd", topvaricance = 1000,#意味着将在数据中最具变异性的前5000个基因上进行分析(标准偏差最高的那些) powers = seq(1, 20, 1), #这样可以通过网格搜索从1到20选择WGCNA的最佳软阈值 rsqcutline = 0.8, #目标最小无标度拓扑拟合指数是0.8。看不懂 mediation = TRUE, #决定在WGCNA流程中是否需要执行因果推断的核心——中介分析 topn = NULL, #因果推断仅针对最具差异性的1个WGCNA模块及其特征执行 默认情况下,此参数为 NULL,在这种情况下,所有差异模块都会进行因果推断,但分析所有模块会耗费时间。 plot = TRUE, #将生成若干图表来展示结果 titleprefix = "Placenta", labelnum = NULL, #设置为 NULL,使没有特征名称被标记 titlesize = 17, textsize = 16, annotextsize = 5, pvalcolname = "adj.P.Val", pvalcutoff = 0.05, isbetaval = TRUE, absxcutoff = 0, #定义模块内基因的 log2FC 阈值,以选择差异基因 diffcutoff = 0, #义模块特征基因差异的阈值,以选择差异模块 threads = 1, balanceadj=T)

输出结果 result <- cwgcnares[["melimmares"]]

This package has been running normally.

VitCCC commented 1 month ago

Did you use my data? Could you please tell me which parameters you modified?

yuabrahamliu commented 1 month ago

Did you use my data? Could you please tell me which parameters you modified?

Your dataset's link does not work, so I used the former users' dataset to try a feature number of 1000, and the package worked in this case. You can use the following codes as a reference for your dataset.

top10k_ext <- featuresampling(betas = extdata, 
                              topfeatures = 1000, 
                              variancetype = "sd", 
                              threads = 6)

anovares <- featuresampling(betas = top10k_ext$betas, 
                            pddat = extpds, 
                            anova = TRUE, 

                            topfeatures = nrow(top10k_ext$betas), 

                            plotannovares = TRUE, 
                            featuretype = "gene", 
                            plottitlesuffix = "Test", 
                            threads = 4)

extgene <- anovares$betas

extres <- diffwgcna(dat = extgene, 
                    pddat = extpds, 
                    responsevarname = "Subtype", 
                    confoundings = "stage", 
                    featuretype = "gene", 

                    topvaricancetype = "sd", 
                    topvaricance = nrow(extgene), 

                    powers = seq(1, 20, 1), 
                    rsqcutline = 0.8, 

                    mediation = TRUE, 

                    topn = 1, 

                    plot = TRUE, 
                    titleprefix = "TCGA", 
                    labelnum = 5, 
                    titlesize = 17, 
                    textsize = 16, 
                    annotextsize = 5, 

                    pvalcolname = "adj.P.Val", 
                    pvalcutoff = 0.05, 
                    isbetaval = FALSE, 
                    absxcutoff = 0, 
                    diffcutoff = 0, 

                    threads = 4, 

                    balanceadj = FALSE)

If you still need help figuring out your dataset, please share it with me again, but I need you to ensure the dataset link works.

VitCCC commented 1 month ago

test.zip

VitCCC commented 1 month ago

I am still encountering some issues. Above is my test data. Thank you all.

123abclxy commented 1 month ago

Did you use my data? Could you please tell me which parameters you modified?

Your dataset's link does not work, so I used the former users' dataset to try a feature number of 1000, and the package worked in this case. You can use the following codes as a reference for your dataset.

top10k_ext <- featuresampling(betas = extdata, 
                              topfeatures = 1000, 
                              variancetype = "sd", 
                              threads = 6)

anovares <- featuresampling(betas = top10k_ext$betas, 
                            pddat = extpds, 
                            anova = TRUE, 

                            topfeatures = nrow(top10k_ext$betas), 

                            plotannovares = TRUE, 
                            featuretype = "gene", 
                            plottitlesuffix = "Test", 
                            threads = 4)

extgene <- anovares$betas

extres <- diffwgcna(dat = extgene, 
                    pddat = extpds, 
                    responsevarname = "Subtype", 
                    confoundings = "stage", 
                    featuretype = "gene", 

                    topvaricancetype = "sd", 
                    topvaricance = nrow(extgene), 

                    powers = seq(1, 20, 1), 
                    rsqcutline = 0.8, 

                    mediation = TRUE, 

                    topn = 1, 

                    plot = TRUE, 
                    titleprefix = "TCGA", 
                    labelnum = 5, 
                    titlesize = 17, 
                    textsize = 16, 
                    annotextsize = 5, 

                    pvalcolname = "adj.P.Val", 
                    pvalcutoff = 0.05, 
                    isbetaval = FALSE, 
                    absxcutoff = 0, 
                    diffcutoff = 0, 

                    threads = 4, 

                    balanceadj = FALSE)

If you still need help figuring out your dataset, please share it with me again, but I need you to ensure the dataset link works.

Could the author include the signed functionality in the WGCNA analysis? This would allow for considering both positive and negative correlations simultaneously. Currently, I see that the network structure only allows for 'unsigned'.

yuabrahamliu commented 1 month ago

I am still encountering some issues. Above is my test data. Thank you all.

I have updated the package. You can run it on your dataset with the following codes.

top10k_ext <- featuresampling(betas = extdata, 
                              topfeatures = 1000, 
                              variancetype = "sd", 
                              threads = 6)

anovares <- featuresampling(betas = top10k_ext$betas, 
                            pddat = extpds, 
                            anova = TRUE, 

                            topfeatures = nrow(top10k_ext$betas), 

                            plotannovares = TRUE, 
                            featuretype = "gene", 
                            plottitlesuffix = "Test", 
                            threads = 6)

extgene <- anovares$betas

extres <- diffwgcna(dat = extgene, 
                    pddat = extpds, 
                    responsevarname = "gleason_score", 
                    confoundings = c("preop_psa","clinical_t_stage"),  #Perhaps you can include more confoundings, as shown by the 
                                                                                                    #ANOVA results on your dataset
                    featuretype = "gene", 

                    topvaricancetype = "sd", 
                    topvaricance = nrow(extgene), 

                    powers = seq(1, 20, 1), 
                    rsqcutline = 0.8, 

                    mediation = TRUE, 

                    topn = 1, 

                    plot = TRUE, 
                    titleprefix = "TCGA", 
                    labelnum = 5, 
                    titlesize = 17, 
                    textsize = 16, 
                    annotextsize = 5, 

                    pvalcolname = "adj.P.Val", 
                    pvalcutoff = 0.05, 
                    isbetaval = FALSE, 
                    absxcutoff = 0, 
                    diffcutoff = 0, 

                    threads = 4, 

                    balanceadj = FALSE)
VitCCC commented 1 month ago

Sorry, dear author. I still haven't been able to run it successfully, so I'll shelve this job for now. Your work is still great, and I'm sure more people will be able to learn this R package in the near future, and it will be easy to use when we all put our heads together. Once again, my highest respect for your work.

yuabrahamliu commented 1 month ago

Sorry, dear author. I still haven't been able to run it successfully, so I'll shelve this job for now. Your work is still great, and I'm sure more people will be able to learn this R package in the near future, and it will be easy to use when we all put our heads together. Once again, my highest respect for your work.

Could you reinstall the package following its tutorial and run the former codes I showed? For your testing data, you can get the following figure.

test