smith-chem-wisc / MetaNetwork

8 stars 5 forks source link

Issue running MetaNetwork #44

Closed vcoyne1 closed 2 days ago

vcoyne1 commented 2 weeks ago

The script runs as shown in the log file:

2024-10-04 15:38:38 2024-10-04 15:38:38 R version 4.0.5 (2021-03-31) -- "Shake and Throw" 2024-10-04 15:38:38 Copyright (C) 2021 The R Foundation for Statistical Computing 2024-10-04 15:38:38 Platform: x86_64-pc-linux-gnu (64-bit) 2024-10-04 15:38:38 2024-10-04 15:38:38 R is free software and comes with ABSOLUTELY NO WARRANTY. 2024-10-04 15:38:38 You are welcome to redistribute it under certain conditions. 2024-10-04 15:38:38 Type 'license()' or 'licence()' for distribution details. 2024-10-04 15:38:38 2024-10-04 15:38:38 R is a collaborative project with many contributors. 2024-10-04 15:38:38 Type 'contributors()' for more information and 2024-10-04 15:38:38 'citation()' on how to cite R or R packages in publications. 2024-10-04 15:38:38 2024-10-04 15:38:38 Type 'demo()' for some demos, 'help()' for on-line help, or 2024-10-04 15:38:38 'help.start()' for an HTML browser interface to help. 2024-10-04 15:38:38 Type 'q()' to quit R. 2024-10-04 15:38:38 2024-10-04 15:38:38 > shiny::runApp('/app', port = 3838, host = '0.0.0.0') 2024-10-04 15:38:39 Loading required package: shiny 2024-10-04 15:38:39 Bioconductor version '3.12' is out-of-date; the current release version '3.19' 2024-10-04 15:38:39 is available with R version '4.4'; see https://bioconductor.org/install 2024-10-04 15:38:39 ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── 2024-10-04 15:38:39 ✔ ggplot2 3.3.5 ✔ purrr 0.3.4 2024-10-04 15:38:39 ✔ tibble 3.1.2 ✔ dplyr 1.0.7 2024-10-04 15:38:39 ✔ tidyr 1.1.3 ✔ stringr 1.4.0 2024-10-04 15:38:39 ✔ readr 1.4.0 ✔ forcats 0.5.1 2024-10-04 15:38:39 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── 2024-10-04 15:38:39 ✖ dplyr::filter() masks stats::filter() 2024-10-04 15:38:39 ✖ dplyr::lag() masks stats::lag() 2024-10-04 15:38:40 2024-10-04 15:38:40 Attaching package: ‘plotly’ 2024-10-04 15:38:40 2024-10-04 15:38:40 The following object is masked from ‘package:ggplot2’: 2024-10-04 15:38:40 2024-10-04 15:38:40 last_plot 2024-10-04 15:38:40 2024-10-04 15:38:40 The following object is masked from ‘package:stats’: 2024-10-04 15:38:40 2024-10-04 15:38:40 filter 2024-10-04 15:38:40 2024-10-04 15:38:40 The following object is masked from ‘package:graphics’: 2024-10-04 15:38:40 2024-10-04 15:38:40 layout 2024-10-04 15:38:40 2024-10-04 15:38:40 Loading required package: dynamicTreeCut 2024-10-04 15:38:40 Loading required package: fastcluster 2024-10-04 15:38:40 2024-10-04 15:38:40 Attaching package: ‘fastcluster’ 2024-10-04 15:38:40 2024-10-04 15:38:40 The following object is masked from ‘package:stats’: 2024-10-04 15:38:40 2024-10-04 15:38:40 hclust 2024-10-04 15:38:40 2024-10-04 15:38:41 2024-10-04 15:38:42 2024-10-04 15:38:42 Attaching package: ‘WGCNA’ 2024-10-04 15:38:42 2024-10-04 15:38:42 The following object is masked from ‘package:stats’: 2024-10-04 15:38:42 2024-10-04 15:38:42 cor 2024-10-04 15:38:42 2024-10-04 15:38:42 2024-10-04 15:38:42 Attaching package: ‘zip’ 2024-10-04 15:38:42 2024-10-04 15:38:42 The following objects are masked from ‘package:utils’: 2024-10-04 15:38:42 2024-10-04 15:38:42 unzip, zip 2024-10-04 15:38:42 2024-10-04 15:38:42 Creating a new generic function for ‘data’ in the global environment 2024-10-04 15:38:42 Creating a new generic function for ‘plotDendroAndColors’ in the global environment 2024-10-04 15:38:42 Creating a new generic function for ‘gost’ in the global environment 2024-10-04 15:38:42 Creating a new generic function for ‘gostplot’ in the global environment 2024-10-04 15:38:43 2024-10-04 15:38:43 Listening on http://0.0.0.0:3838 2024-10-04 15:39:05 2024-10-04 15:39:05 ── Column specification ──────────────────────────────────────────────────────── 2024-10-04 15:39:05 cols( 2024-10-04 15:39:05 .default = col_double(), 2024-10-04 15:39:05 Accession = col_character() 2024-10-04 15:39:05 ) 2024-10-04 15:39:05 ℹ Use spec() for the full column specifications. 2024-10-04 15:39:05 2024-10-04 15:39:31 2024-10-04 15:39:31 ── Column specification ──────────────────────────────────────────────────────── 2024-10-04 15:39:31 cols( 2024-10-04 15:39:31 .default = col_double(), 2024-10-04 15:39:31 Accession = col_character() 2024-10-04 15:39:31 ) 2024-10-04 15:39:31 ℹ Use spec() for the full column specifications. 2024-10-04 15:39:31 2024-10-04 15:39:31 2024-10-04 15:39:31 ── Column specification ──────────────────────────────────────────────────────── 2024-10-04 15:39:31 cols( 2024-10-04 15:39:31 SampleID = col_character(), 2024-10-04 15:39:31 Experiment = col_character() 2024-10-04 15:39:31 ) 2024-10-04 15:39:31 2024-10-04 15:39:31 2024-10-04 15:39:31 ── Column specification ──────────────────────────────────────────────────────── 2024-10-04 15:39:31 cols( 2024-10-04 15:39:31 Entry = col_character(), 2024-10-04 15:39:31 Protein names = col_character(), 2024-10-04 15:39:31 Gene Names = col_character() 2024-10-04 15:39:31 ) 2024-10-04 15:39:31 2024-10-04 15:39:31 Warning: Unknown or uninitialised column: Gene names. 2024-10-04 15:39:33 Warning in WGCNA::blockwiseModules(cleaned_data@Data[, -1], power = parameters@power, : 2024-10-04 15:39:33 NAs introduced by coercion 2024-10-04 15:39:38 Warning: Error in : All columns in a tibble must be vectors. 2024-10-04 15:39:38 ✖ Column Gene is NULL. 2024-10-04 15:39:38 110: 2024-10-04 15:39:31 pickSoftThreshold: will use block size 1557. 2024-10-04 15:39:31 pickSoftThreshold: calculating connectivity for given powers... 2024-10-04 15:39:31 ..working on genes 1 through 1557 of 1557 2024-10-04 15:39:33 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 2024-10-04 15:39:33 1 1 0.11600 0.7060 0.2190 504.00 540.000 745.0 2024-10-04 15:39:33 2 2 0.00522 0.0764 -0.0670 247.00 270.000 435.0 2024-10-04 15:39:33 3 3 0.06810 -0.1500 -0.0342 145.00 152.000 297.0 2024-10-04 15:39:33 4 4 0.42500 -0.4240 0.4970 94.10 92.800 232.0 2024-10-04 15:39:33 5 5 0.54100 -0.6720 0.6990 65.00 58.900 192.0 2024-10-04 15:39:33 6 6 0.61000 -0.8670 0.7890 47.00 38.900 163.0 2024-10-04 15:39:33 7 7 0.67400 -1.0100 0.8490 35.00 26.200 141.0 2024-10-04 15:39:33 8 8 0.70000 -1.1200 0.8760 26.80 18.000 123.0 2024-10-04 15:39:33 9 9 0.75200 -1.1700 0.9160 21.00 12.700 109.0 2024-10-04 15:39:33 10 10 0.79700 -1.2400 0.9360 16.70 9.020 96.3 2024-10-04 15:39:33 11 12 0.82600 -1.3200 0.9520 11.00 4.640 77.3 2024-10-04 15:39:33 12 14 0.83400 -1.3600 0.9590 7.61 2.490 63.2 2024-10-04 15:39:33 13 16 0.86500 -1.4100 0.9670 5.43 1.370 52.5 2024-10-04 15:39:33 14 18 0.89100 -1.4200 0.9790 4.00 0.782 44.1 2024-10-04 15:39:33 15 20 0.90300 -1.4300 0.9810 3.01 0.465 37.4 2024-10-04 15:39:33 Calculating module eigengenes block-wise from all genes 2024-10-04 15:39:33 Flagging genes and samples with too many missing values... 2024-10-04 15:39:33 ..step 1 2024-10-04 15:39:33 ..Working on block 1 . 2024-10-04 15:39:33 TOM calculation: adjacency.. 2024-10-04 15:39:33 ..will not use multithreading. 2024-10-04 15:39:33 Fraction of slow calculations: 0.000000 2024-10-04 15:39:33 ..connectivity.. 2024-10-04 15:39:33 ..matrix multiplication (system BLAS).. 2024-10-04 15:39:33 ..normalization.. 2024-10-04 15:39:33 ..done. 2024-10-04 15:39:35 ....clustering.. 2024-10-04 15:39:35 ....detecting modules.. 2024-10-04 15:39:36 ..done. 2024-10-04 15:39:37 ....calculating module eigengenes.. 2024-10-04 15:39:37 moduleEigengenes : Working on ME for module 1 2024-10-04 15:39:37 moduleEigengenes : Working on ME for module 2 2024-10-04 15:39:37 moduleEigengenes : Working on ME for module 3 2024-10-04 15:39:37 moduleEigengenes : Working on ME for module 4 2024-10-04 15:39:37 moduleEigengenes : Working on ME for module 5 2024-10-04 15:39:37 moduleEigengenes : Working on ME for module 6 2024-10-04 15:39:37 moduleEigengenes : Working on ME for module 7 2024-10-04 15:39:37 ....checking kME in modules.. 2024-10-04 15:39:37 ..removing 40 genes from module 1 because their KME is too low. 2024-10-04 15:39:37 ..removing 13 genes from module 2 because their KME is too low. 2024-10-04 15:39:37 ..removing 16 genes from module 3 because their KME is too low. 2024-10-04 15:39:37 ..removing 14 genes from module 4 because their KME is too low. 2024-10-04 15:39:37 ..removing 1 genes from module 7 because their KME is too low. 2024-10-04 15:39:38 ..merging modules that are too close.. 2024-10-04 15:39:38 mergeCloseModules: Merging modules whose distance is less than 0.25 2024-10-04 15:39:38 multiSetMEs: Calculating module MEs. 2024-10-04 15:39:38 Working on set 1 ... 2024-10-04 15:39:38 moduleEigengenes: Calculating 8 module eigengenes in given set. 2024-10-04 15:39:38 multiSetMEs: Calculating module MEs. 2024-10-04 15:39:38 Working on set 1 ... 2024-10-04 15:39:38 moduleEigengenes: Calculating 4 module eigengenes in given set. 2024-10-04 15:39:38 multiSetMEs: Calculating module MEs. 2024-10-04 15:39:38 Working on set 1 ... 2024-10-04 15:39:38 moduleEigengenes: Calculating 3 module eigengenes in given set. 2024-10-04 15:39:38 Calculating new MEs... 2024-10-04 15:39:38 multiSetMEs: Calculating module MEs. 2024-10-04 15:39:38 Working on set 1 ... 2024-10-04 15:39:38 moduleEigengenes: Calculating 3 module eigengenes in given set.

It then hangs after showing the following in the Power shell:

Warning: Error in : All columns in a tibble must be vectors. ✖ Column Gene is NULL. 110:

I have no idea how to rectify this - your assistance would be most appreciated.

Thanking you in advance,

JLane-scripps commented 5 days ago

Hi there, I'm not affiliated with the Smith lab, but I've had this error multiple times when using the program. The problem is that your uniprot data .tsv file has a column named "Gene Names" when it needs to be "Gene names". "Name" needs to be lower case, not capitalized. The capitalization is done by Uniprot automatically on that column, but not Protein names, for some reason. Changing the capitalization should fix this issue.

Good luck!

vcoyne1 commented 5 days ago

Hi Jeff,

Thanks for the assistance. I did have an uppercase N. I corrected the problem and all seemed to be running well. Unfortunately I have hit a new issue:

Warning: Error in WGCNA::TOMplot: ERROR: number of color labels does not equal number of nodes in dissim. nNodes != dim(dissim)[[1]]

I have no idea how to rectify this. Will do some Googling and see what I can come up with.

Cheers, Vernon

JLane-scripps commented 5 days ago

Hi Vernon,

The TOMplot error -- in my experience so far -- is because your experiment dataset has missing values. MetaNetwork doesn't handle missing values -- you either need to fill each of those null / blank / missing values with something to stand-in, or you need to remove the rows that have missing values in the experiment data.

I wrote a quick script for this on Streamlit you're welcome to use if it's helpful. The first function removes all blank rows from a dataset and saves the new dataset as a new file.csv. The second function is for after you've finished the analysis and want to see just significantly altered proteins. https://wgcna-file-trimmer.streamlit.app/
This was something I made for a researcher in my own lab, not sure if it'll be helpful to you too.

vcoyne1 commented 3 days ago

Hi Jeff,

Once again, thank you for your assistance.

I made use of the file trimmer (removed blanks before running MetaNetworks). Unfortunately, it now has a new issue (see log below):

cols( Entry = col_character(), Protein names = col_character(), Gene names = col_character() )

pickSoftThreshold: will use block size 342. pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 342 of 342 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 1 0.7880 2.0500 0.7340 190.00 211.00 243.0 2 2 0.6700 0.8370 0.5840 130.00 148.00 196.0 3 3 0.3390 0.3750 0.1950 95.60 109.00 166.0 4 4 0.0215 0.0701 -0.2550 73.50 81.80 143.0 5 5 0.1670 -0.1830 -0.0705 58.10 62.10 124.0 6 6 0.3560 -0.3290 0.2040 46.80 48.40 109.0 7 7 0.5020 -0.4590 0.4130 38.30 37.90 97.1 8 8 0.5370 -0.5800 0.5170 31.70 29.90 86.6 9 9 0.6270 -0.6520 0.6350 26.50 23.80 77.6 10 10 0.6050 -0.7580 0.6390 22.40 19.00 69.9 11 12 0.7230 -0.8630 0.8010 16.30 12.40 57.3 12 14 0.7320 -0.9630 0.8410 12.20 8.32 47.5 13 16 0.7970 -1.0400 0.8990 9.27 5.65 39.9 14 18 0.8200 -1.0900 0.9220 7.18 3.94 33.7 15 20 0.8560 -1.1100 0.9620 5.65 2.99 28.8 Calculating module eigengenes block-wise from all genes Flagging genes and samples with too many missing values... ..step 1 ..Working on block 1 . Warning in WGCNA::blockwiseModules(cleaned_data@Data[, -1], power = parameters@power, : NAs introduced by coercion TOM calculation: adjacency.. ..will not use multithreading. Fraction of slow calculations: 0.000000 ..connectivity.. ..matrix multiplication (system BLAS).. ..normalization.. ..done. ....clustering.. ....detecting modules.. ..done. ....calculating module eigengenes.. moduleEigengenes : Working on ME for module 1 moduleEigengenes : Working on ME for module 2 moduleEigengenes : Working on ME for module 3 ....checking kME in modules.. ..merging modules that are too close.. mergeCloseModules: Merging modules whose distance is less than 0.25 multiSetMEs: Calculating module MEs. Working on set 1 ... moduleEigengenes: Calculating 4 module eigengenes in given set. multiSetMEs: Calculating module MEs. Working on set 1 ... moduleEigengenes: Calculating 3 module eigengenes in given set. Calculating new MEs... multiSetMEs: Calculating module MEs. Working on set 1 ... moduleEigengenes: Calculating 3 module eigengenes in given set. Warning: Error in graphics:::plotHclust: invalid dendrogram input 103: graphics:::plotHclust 102: plot.hclust 100: WGCNA::plotEigengeneNetworks 99: PlotModuleEigenproteinDiagnostics [/app/app.R#1708] 97: BuildAllPlots [/app/app.R#1055] 95: eventReactiveValueFunc [/app/app.R#2625] 51: WGCNA_workflow_results 44: [/app/app.R#2657] 1: shiny::runApp

Once again, I have no idea what needs to be changed.

JLane-scripps commented 3 days ago

Hey Vernon, no problem; glad my struggle with these errors can save someone else some hours of frustration, and glad my program was useful.

Unfortunately I don't know this specific error's cause or solution. It seems to happen as soon as the dendrogram begins plotting. My gut is telling me at first glance that 3 module eigengenes is too few, but I can't base that on anything. That could be totally off. Do you know how much of your original data was removed along with the blanks? In my researcher's file, I had to remove 23% of the rows -- 23% of the proteins. He said that was an acceptable loss for ours but there would be a limit to how much data could be lost and still find this analysis useful.

vcoyne1 commented 2 days ago

Hi Jeff,

I lost 1,214 rows! Most likely too many. Interestingly, I am able to run a WGCNA analysis using the file generated from Streamlit using a different package. It generated 2 modules, but could draw the dendrograms which did make sense in terms of the GO BPs, etc. I think that the loss of so many proteins is too many - I have used other methods to impute values and remove duplicates, etc., which work in other packages (iDEP). This way I end up with 5 modules which make sense as before. I have no idea why MetaNetworks can't do the same. Being a GUI, there is not much one can do with regard to changing parameters for the run.

All the best, Vernon

vcoyne1 commented 2 days ago

Finally got MetaNetwork to run :) I used a different dataset that had less missing values, ran it through your script to remove rows and then found it ran perfectly in MetaNetwork. The new attempt only removed 747 rows. Resulted in 4 modules.

Thank you for your patience with this.