A question about data - Githubissues

409199 commented 6 months ago

Hello I would like to ask if it is possible to do CWGCNA for continuous variables? How to do?

this is my pds data

head(pds) Pyruvic_acid e_Hydroxypropanoic_acid e_Aminobutyric_acid lccZT0.sorted.bam 3610 21900 19400 lccZT4.sorted.bam 13000 21100 29600 lccZT8.sorted.bam 9290 28000 20900 lccZT12.sorted.bam 8590 31200 32000 lccZT16.sorted.bam 13800 17000 18700 lccZT20.sorted.bam 7190 26500 30700 w_Hydroxyisobutyric_acid e_Hydroxybutanoic_acid Malonic_acid lccZT0.sorted.bam 10900 59600 2330000 lccZT4.sorted.bam 4150 30600 2500000 lccZT8.sorted.bam 8920 26500 2370000 lccZT12.sorted.bam 8400 37000 2340000 lccZT16.sorted.bam 6720 27200 2860000 lccZT20.sorted.bam 11100 40000 3150000

this is my top10k

head(top10k$betas[1:3,1:3]) lccZT0.sorted.bam lccZT4.sorted.bam lccZT8.sorted.bam BAA08g14470 14.9167999 13.433832 9.142474 BAA09g49970 0.9592866 3.483733 1.613576 BAA09g46830 30.5573931 21.696657 18.087803 head(top10k$varicanceres, 3) SD BAA04g24970 7250.253 BAA09g36430 4141.541 BAA01g13800 4129.397

I have run :

anovares <- featuresampling(betas = top10k$betas, # 使用 betas

pddat = pds, # 性状数据

anova = TRUE, # 执行 ANOVA

plotannovares = TRUE, # 绘制数据的ANOVA结果

featuretype = "probe",

plottitlesuffix = "placenta", # 显示在图表标题中的字符

titlesize = 18,

textsize = 16,

threads = 6) Error in betas[, pddat[, 1], drop = FALSE] : subscript out of bounds

How to solve it?

Best wish！

yuabrahamliu commented 6 months ago

It can be used on continuous data, the codes are the same as binary data.

This error has not been reported before. Please share testing data with me so that I can examine it.

409199 commented 6 months ago

OK，this is my data. test.zip

yuabrahamliu commented 6 months ago

OK，this is my data. test.zip

Upon reviewing your data, I've identified two specific issues that need to be addressed for accurate analysis.

1) As mentioned in the help document, the pddat parameter of the function featuresampling should be a "Meta data frame. The first column should be the sample names, and the remaining should be the phenotypic variables need to be analyzed with ANOVA."

You provided a matrix without a column named "sampleid". Please follow the help document to revise it.

2) Your pd data contains 12 samples but 49 variables, violating the requirement of ANOVA for degree of freedom. Please reduce your variables or increase your sample size so that sample size > variable number. You can get more information about this from the post below.

https://stackoverflow.com/questions/42146592/issue-with-car-anova-function-in-r

409199 commented 5 months ago

First of all, thank you for your guidance, I have successfully solved all the above problems. There are no errors，but I only get a graph after running the code.

cwgcnares <- diffwgcna(dat = top10k$betas, pddat = lccpds, responsevarname = "sampleid", confoundings = c("Pyruvic_acid","e_Hydroxypropanoic_acid", "e_Aminobutyric_acid","w_Hydroxyisobutyric_acid"), featuretype = "gene", topvaricancetype = "sd", topvaricance = 5000, powers = seq(1, 30, 1), rsqcutline = 0.8, mediation = TRUE, topn = 1, plot = TRUE, titleprefix = "Placenta", labelnum = 4, titlesize = 17, textsize = 16, annotextsize = 5, pvalcolname = "adj.P.Val", pvalcutoff = 0.05, isbetaval = TRUE, absxcutoff = 0, diffcutoff = 0, balanceadj=FALSE, threads = 1) The mediation analysis will be skipped because it only supports binary or continuous variables but the current response is not. pickSoftThreshold: will use block size 5000. pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 5000 of 5000 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 1 0.7550 4.590 0.974 1940.00 1980.00 2550.0 2 2 0.5180 1.670 0.914 1040.00 1070.00 1640.0 3 3 0.2320 0.710 0.829 642.00 660.00 1150.0 4 4 0.0182 0.157 0.722 432.00 441.00 857.0 5 5 0.0145 -0.128 0.701 309.00 311.00 663.0 6 6 0.1100 -0.353 0.698 230.00 228.00 529.0 7 7 0.2020 -0.482 0.712 177.00 171.00 432.0 8 8 0.3240 -0.568 0.786 139.00 132.00 360.0 9 9 0.4550 -0.706 0.869 112.00 104.00 313.0 10 10 0.5490 -0.838 0.922 92.20 83.70 279.0 11 11 0.6250 -0.996 0.954 76.70 68.10 253.0 12 12 0.6860 -1.120 0.978 64.70 55.60 233.0 13 13 0.7290 -1.220 0.987 55.10 46.00 216.0 14 14 0.7660 -1.300 0.991 47.50 38.70 201.0 15 15 0.7960 -1.380 0.992 41.20 32.60 187.0 16 16 0.8160 -1.440 0.992 36.00 27.70 176.0 17 17 0.8390 -1.490 0.989 31.70 23.70 165.0 18 18 0.8580 -1.530 0.985 28.10 20.40 156.0 19 19 0.8690 -1.560 0.982 25.00 17.60 147.0 20 20 0.8780 -1.580 0.979 22.40 15.20 139.0 21 21 0.8820 -1.620 0.970 20.20 13.30 132.0 22 22 0.8920 -1.650 0.970 18.20 11.70 125.0 23 23 0.9040 -1.660 0.974 16.50 10.30 119.0 24 24 0.9160 -1.670 0.977 15.00 9.12 114.0 25 25 0.9200 -1.680 0.973 13.70 8.08 108.0 26 26 0.9250 -1.700 0.973 12.60 7.21 104.0 27 27 0.9310 -1.700 0.972 11.50 6.48 99.0 28 28 0.9370 -1.700 0.974 10.60 5.80 94.8 29 29 0.9420 -1.710 0.975 9.79 5.23 90.8 30 30 0.9390 -1.720 0.969 9.06 4.72 87.1 Calculating module eigengenes block-wise from all genes Flagging genes and samples with too many missing values... ..step 1 ..Working on block 1 . TOM calculation: adjacency.. ..will not use multithreading. Fraction of slow calculations: 0.000000 ..connectivity.. ..matrix multiplication (system BLAS).. ..normalization.. ..done. ..saving TOM for block 1 into file blockwiseTOM.2024-06-02_17-29-55.235508-block.1.RData ....clustering.. ....detecting modules.. ....calculating module eigengenes.. ....checking kME in modules.. ..removing 1 genes from module 10 because their KME is too low. ..removing 1 genes from module 15 because their KME is too low. ..removing 1 genes from module 27 because their KME is too low. ..merging modules that are too close.. mergeCloseModules: Merging modules whose distance is less than 0.2 Calculating new MEs... $mar [1] 1 5 0 1 微信图片_20240602173704 This is my cwgcnares. 微信图片_20240602173913 Is there something missing or something is wrong?

yuabrahamliu commented 5 months ago

The parameter responsevarname was set incorrectly. As shown by the help document, it should be "The column name of the response variable in the data frame provided to pddat. It is the variable of interest."

The variable you are interested in is obviously not the "sampleid" as you set, it should be the the variable recording your data's biological traits, such as disease/control groups, donor ages, etc.

You can also change the parameter titleprefix from "Placenta" to other strings related to your data.

You can read the help document and follow it.

409199 commented 5 months ago

I used three F>1 acids for my analysis and got the following error, why is that?

cwgcnares <- diffwgcna(dat = betas,

pddat = lccpds,
responsevarname = "e_Aminobutyric_acid",
confoundings = c("w_Hydroxyisobutyric_acid","e_Hydroxybutanoic_acid"),
featuretype = "gene",
topvaricancetype = "sd",
topvaricance = 5000,
powers = seq(1, 30, 1),
rsqcutline = 0.8,
mediation = TRUE,
topn = 1,
plot = TRUE,
titleprefix = "Cabbage",
labelnum = 4,
titlesize = 17,
textsize = 16,
annotextsize = 5,
pvalcolname = "adj.P.Val",
pvalcutoff = 0.05,
isbetaval = TRUE,
absxcutoff = 0,
diffcutoff = 0,
balanceadj=FALSE,
threads = 1) pickSoftThreshold: will use block size 5000. pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 5000 of 5000 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 1 0.7550 4.590 0.974 1940.00 1980.00 2550.0 2 2 0.5180 1.670 0.914 1040.00 1070.00 1640.0 3 3 0.2320 0.710 0.829 642.00 660.00 1150.0 4 4 0.0182 0.157 0.722 432.00 441.00 857.0 5 5 0.0145 -0.128 0.701 309.00 311.00 663.0 6 6 0.1100 -0.353 0.698 230.00 228.00 529.0 7 7 0.2020 -0.482 0.712 177.00 171.00 432.0 8 8 0.3240 -0.568 0.786 139.00 132.00 360.0 9 9 0.4550 -0.706 0.869 112.00 104.00 313.0 10 10 0.5490 -0.838 0.922 92.20 83.70 279.0 11 11 0.6250 -0.996 0.954 76.70 68.10 253.0 12 12 0.6860 -1.120 0.978 64.70 55.60 233.0 13 13 0.7290 -1.220 0.987 55.10 46.00 216.0 14 14 0.7660 -1.300 0.991 47.50 38.70 201.0 15 15 0.7960 -1.380 0.992 41.20 32.60 187.0 16 16 0.8160 -1.440 0.992 36.00 27.70 176.0 17 17 0.8390 -1.490 0.989 31.70 23.70 165.0 18 18 0.8580 -1.530 0.985 28.10 20.40 156.0 19 19 0.8690 -1.560 0.982 25.00 17.60 147.0 20 20 0.8780 -1.580 0.979 22.40 15.20 139.0 21 21 0.8820 -1.620 0.970 20.20 13.30 132.0 22 22 0.8920 -1.650 0.970 18.20 11.70 125.0 23 23 0.9040 -1.660 0.974 16.50 10.30 119.0 24 24 0.9160 -1.670 0.977 15.00 9.12 114.0 25 25 0.9200 -1.680 0.973 13.70 8.08 108.0 26 26 0.9250 -1.700 0.973 12.60 7.21 104.0 27 27 0.9310 -1.700 0.972 11.50 6.48 99.0 28 28 0.9370 -1.700 0.974 10.60 5.80 94.8 29 29 0.9420 -1.710 0.975 9.79 5.23 90.8 30 30 0.9390 -1.720 0.969 9.06 4.72 87.1 Calculating module eigengenes block-wise from all genes Flagging genes and samples with too many missing values... ..step 1 ..Working on block 1 . TOM calculation: adjacency.. ..will not use multithreading. Fraction of slow calculations: 0.000000 ..connectivity.. ..matrix multiplication (system BLAS).. ..normalization.. ..done. ..saving TOM for block 1 into file blockwiseTOM.2024-06-05_19-56-17.651369-block.1.RData ....clustering.. ....detecting modules.. ....calculating module eigengenes.. ....checking kME in modules.. ..removing 1 genes from module 10 because their KME is too low. ..removing 1 genes from module 15 because their KME is too low. ..removing 1 genes from module 27 because their KME is too low. ..merging modules that are too close.. mergeCloseModules: Merging modules whose distance is less than 0.2 Calculating new MEs... $mar [1] 1 5 0 1

Start limma analysis on WGCNA modules NULL Error in diffwgcna(dat = betas, pddat = lccpds, responsevarname = "e_Aminobutyric_acid", : object 'meplotdat' not found In addition: Warning messages: 1: In featuresampling(betas = dat, topfeatures = topvaricance, pddat = NULL, : The parameter betas needs a matrix, but the current one is not,

        so it has been converted to a matrix.

2: executing %dopar% sequentially: no parallel backend registered 3: The labeller API has been updated. Labellers taking variable and value arguments are now deprecated. ℹ See labellers documentation.

yuabrahamliu commented 5 months ago

This error has not been reported before. Please share an example dataset with me so that I can examine it.

409199 commented 5 months ago

OK,this is my data.test.zip Thank you very much.

yuabrahamliu commented 5 months ago

I have updated the package so that it will not raise an error on your dataset. You can re-install it.

However, from the limma analysis step, no WGCNA modules show a statistically significant relation with your response variable e_Aminobutyric_acid (their adjusted p-values are all >= 0.05). Hence, the next causal inference will not be performed.

Suppose you really want to find significant modules. In that case, you can change the parameter pvalcolname from the default "adj.P.Val" to "P.Value", so as long as a module has a p-value < 0.05, rather than an adjusted p-value < 0.05, it will be considered as significant, and the causal inference will be performed on them.

In addition, the ANOVA results from featuresampling showed that in addition to your w_Hydroxyisobutyric_acid and e_Hydroxyisobutyric_acid, Pyruvic_acid is also a potential confounding factor with an F value > 1, so you can also include it in the parameter confoundings of diffwgcna.

yuabrahamliu / CWGCNA

A question about data #7