kdahlquist / GRNmap

Gene Regulatory Network modeling and parameter estimation
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Do a multiple regression to look for relationships between MSE, P , d, b, w, graph stats, etc. #330

Closed kdahlquist closed 6 years ago

kdahlquist commented 7 years ago

Creating a new issue for this, splitting it off from #315.

bklein7 commented 7 years ago

Data was compiled to perform a multiple regression analysis for db1 in the following spreadsheet: https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/db1_mse-multiple-regression_BK20170220.xlsx.

A brief, preliminary multiple regression analysis for performed. The results of this analysis have been uploaded so that we may discuss them next Thursday: https://github.com/kdahlquist/DahlquistLab/blob/master/documents/db1-Multiple-Regression_BK20170220.pptx.

kdahlquist commented 7 years ago

Next task for @bklein7 will be to do the multiple regression on other networks. db2 and db3 should be next because they are smaller/larger versions of each other and it will be interesting to see if they end up being similar or different in the analysis.

kdahlquist commented 7 years ago

Actually, db2 and db3 look pretty different in the histograms, so it will be really interesting to see the analysis.

kdahlquist commented 7 years ago
bklein7 commented 7 years ago

This week, I compiled data in preparation for completing multiple regression analyses for db2-db6. Unfortunately, I could not complete the analyses for db2 and db3, as only Gephi data for the old 15-gene dCIN5 network is available on the Dahlquist Lab repository. When possible, I would like an update from @maggie-oneil or @khorstmann regarding whether Gephi stats for the new dCIN5 networks (db2 & db3) are available and, if so, where I can find them.

I also briefly began a multiple regression analysis for db4. Thus far, it appears as though the only significant predictor of average MSE in this network is the corrected B&H p-values (within strain ANOVA). However, this correlation is negative, which is contrary to our expectation and quite possibly artifactual.

maggie-oneil commented 7 years ago

The db2 and db3 raw data can be found in the data repository for the dahlquistlab page. db2 here -https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/Gephi_raw_db2_14-gene_25-edges_NW_dCIN5_fam_Sigmoid_estimation_output.csv db3 here - https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/Gephi_raw_db3_17-genes_32-edges_NW-dCIN5-fam_Sigmoid_estimation_output%20.csv

bklein7 commented 7 years ago

This week, multiple regression analyses were performed for db2-db6. The excel spreadsheets containing the data used for these analyses can be found in this folder: https://github.com/kdahlquist/DahlquistLab/tree/master/data/15-gene_networks_analysis.

The compiled results of the multiple regression analyses conducted for db1-db6 were uploaded here: https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/db1-db6_Multiple-Regression_BK20170329.pptx.

kdahlquist commented 7 years ago

To summarize discussion at the March 30 meeting, @bklein7 has run a multiple regression in SPSS on db1-6 comparing MSE, P, d, b, ANOVA p value, in-degree, out-degree, betweenness centrality, closeness centrality, eigen centrality, eccentricity.

He needs to double-check to confirm that Gephi stats were from unweighted or weighted networks.

@kdahlquist noted that the strongest relationship (across 3 networks) was for the degradation rate, something that we actually provided to the model and didn't estimate.

She also noted the following (same list as task issue #343:

bklein7 commented 7 years ago

After reviewing the Gephi worksheets used in the last round of multiple regression analyses, I noted that weighted statistics were present (e.g. weighted in-degree) but not used. Instead, the statistics that were not labeled as "weighted" were used. Nonetheless, this leads me to believe that the Gephi statistics were derived from the weighted networks. I will confirm this during lab meeting today.

I have begun compiling data for the next round of multiple regression analyses and will perform these tests next week.

kdahlquist commented 7 years ago

Open tasks:

bklein7 commented 7 years ago

The protocol for performing multiple regression analyses in SPSS (in the context of GRNmap) can be found here: http://www.openwetware.org/wiki/Analyzing_GRNmap_Output_Workbooks_Using_Multiple_Regression_and_SPSS. I also added this link to the multiple regression analysis folder's ReadMe file within the Dahlquist Lab repository: https://github.com/kdahlquist/DahlquistLab/tree/master/data/Spring2017/15-gene_networks_analysis/multiple_regression_analysis.

During our final lab meeting, we agreed that the new multiple regression analyses for the three best and worst random networks would be performed next semester.

kdahlquist commented 6 years ago

We will eventually come back to this when we re-run models with cleaned-up GRNmap code. However, for now, we will close it. It will be here to refer back to when needed again.