kdahlquist / GRNmap

Gene Regulatory Network modeling and parameter estimation
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Metrics for analyzing weight data #315

Closed kdahlquist closed 6 years ago

kdahlquist commented 7 years ago

Our task for the next week is to brainstorm a list of comparisons or metrics that we want to do with the model data to help us understand the results of the modeling. What we came up with in the meeting already is listed here. Please add to this list by commenting on this issue. We will prioritize and assign tasks at next week's meeting.

Note that the above MSE stuff has been moved to issue #326

Tasks below moved to #331

kdahlquist commented 7 years ago

Reposting @maggie-oneil 's comment here:

Distance-Matrix_From-Unweighted-Adjacency-15-genes_28-edges_GJ-dHAP4-fam_strains-added_Sigmoid_Estimation.xlsx

Here's the distance matrix. Might need to be looked at again to confirm correct values

bklein7 commented 7 years ago

In preparation for the LMU URS, I began analyzing edge weights based on the modeling results from the six database-derived networks. This work aims to satisfy the points listed above: generating descriptive statistics of optimized weights, comparing weight values for edges that are the same between networks. The spreadsheet that I have been working with thus far can be found here: https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/Regulatory-Relationships_Six-Networks_BK20170206.xlsx.

@kdahlquist and I discussed the following ways to further analyze this data:

bklein7 commented 7 years ago

Based on the commentary provided during last week's meeting, I expanded the regulatory weight analysis discussed above. Further statistics assessing the properties of weights within each network have been added to this spreadsheet: https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/Regulatory-Relationships_Six-Networks_BK20170206.xlsx.

In addition, an analysis of the distribution of normalized weight values in each network using histograms can be found here: https://github.com/kdahlquist/DahlquistLab/blob/master/documents/Weight-Distributions_Six-Networks_BK2017213.pptx.

kdahlquist commented 7 years ago

This issue has been renamed to just be about the analysis of the weights that @bklein7 has been doing and the other ideas have been moved off to separate issues: #326, #328, #329, #330, #331.

Notes from Thursday's meeting:

bklein7 commented 7 years ago

Based on the feedback provided during last week's meeting, the weights analysis spreadsheet has been updated: https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/Regulatory-Relationships_Six-Networks_BK20170206.xlsx. The new version automates the statistical analysis and color coding of regulatory weights. However, it is currently only designed to support data from six different networks. This can be expanded moving forward. Further, once a data analysis protocol for these weights has been established, a second tab can be formatted to facilitate importing data into SPSS (if necessary).

Further, the weight distribution analyses were updated with improved histograms and cumulative plots: https://github.com/kdahlquist/DahlquistLab/blob/master/documents/Weight-Distributions_Six-Networks_BK2017213.pptx.

kdahlquist commented 7 years ago

Notes from the 2/23 meeting:

bklein7 commented 7 years ago

Based on the feedback provided during last week's meeting, the weight distributions analysis has been updated: https://github.com/kdahlquist/DahlquistLab/blob/master/documents/Weight-Distributions_Six-Networks_BK2017213.pptx. This new version includes histograms with a bin size of 20 and cumulative percentage line graphs.

There are some issues with the new method I found in SPSS to plot cumulative percentage line graphs. Moving forward, it may be best to manually generate cumulative distribution functions for each set of weights and plot those. I investigated methods for doing so in SPSS but could not find a clear answer (in a reasonable amount of time). Thus, I will talk to @bengfitzpatrick further about this during the upcoming data analysis team meeting.

kdahlquist commented 7 years ago

To recap on this one:

kdahlquist commented 7 years ago

We also need instructions for how to do the heat map in SPSS.

bklein7 commented 7 years ago

Weighted degree distribution charts and cumulative plots for db5, RAND7, RAND12, RAND15, RAND16, RAND24, RAND31 ("parent" network + 3 best/worst random networks) have been compiled for comparison: https://github.com/kdahlquist/DahlquistLab/blob/master/documents/six-random-networks_weight-distribution-figures_BK20170405.pptx.

I will write instructions for creating the degree distribution charts, cumulative plots, and weights heat map. The latter was made in Excel using conditional formatting, so I could likely add a second tab to the generalized Excel spreadsheet (for analyzing weights) that could produce a heat map instead. What would be the best format for documenting the weighted degree distribution chart and cumulative plot protocols?

kdahlquist commented 7 years ago

Use the OWW wiki for documentation. You could either include it in your notebook or create a new protocol page under the DahlquistLab site. See one of the other protocols for how to format. http://www.openwetware.org/wiki/Dahlquist:Protocols

kdahlquist commented 7 years ago
bklein7 commented 7 years ago

An updated version of the weights analysis spreadsheet was uploaded to the Dahlquist Lab repository: https://github.com/kdahlquist/DahlquistLab/blob/master/data/Spring2017/15-gene_networks_analysis/Regulatory-Relationships_Six-Networks_BK20170313.xlsx. This version automates normalization of input edge weights to the singular maximum weight value (all networks) in the "Normalized Values" tab. This data is then used to generate a heat map akin to the one presented at the 2017 LMU Research Symposium, which can be viewed in the "Heat Map" tab.

bklein7 commented 7 years ago

The protocol for creating histograms and cumulative plots in SPSS to show the distribution of weighted regulatory weights in a network has been written: http://www.openwetware.org/wiki/Generating_Distribution_Charts_and_Cumulative_Plots_for_GRNmap_Weight_Values_in_SPSS. The link to this protocol has also been added to the 15-gene_networks_analysis folder's ReadMe file on the Dahlquist Lab repository: https://github.com/kdahlquist/DahlquistLab/tree/master/data/Spring2017/15-gene_networks_analysis.

kdahlquist commented 6 years ago

This looks like it has been completed to the state required at the end of last semester. We are going to end up re-doing some analyses with model runs on cleaned up GRNmap code. We can refer back here as needed when we need to run the analyses again.