kdahlquist / GRNmap

Gene Regulatory Network modeling and parameter estimation
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Compute graph statistics for random networks with Gephi #325

Closed kdahlquist closed 7 years ago

kdahlquist commented 7 years ago

Opened this new issue and assigned to @khorstmann as discussed at the meeting. We are going to give each person on the team his or her own issues to make it easier to track what is going on with more granularity. @maggie-oneil will retain the other issue #290, which has been renamed to refer to the db-derived networks only.

Only do unweighted random networks at this point because we are still unsure how Gephi is incorporating the weight information into the graph statistics.

You can go ahead and do all the random networks that @Nwilli31 has made because they don't have to be run in GRNmap to run the unweighted versions through Gephi. However, you should go ahead and start compiling some descriptive statistics before you've done them all (the goal is a total of 30).

@khorstmann and @maggie-oneil should make an Excel spreadsheet analogous to what @bklein7 did for the weight parameters. I.e., list all the 28 genes for all the networks and then make columns for each of the graph stats for each of the random networks.

A workbook with the list of genes already exists here: https://github.com/kdahlquist/DahlquistLab/blob/master/data/GRNmap_input_workbooks/GRN_Gene_Lists.xlsx

Then you can start doing some descriptive statistics, like mean, median, max, min, standard deviation and we can more easily compare the data across networks.

We will also think about ways to plot the data.

kdahlquist commented 7 years ago

As discussed at the 2/23 meeting, @khorstmann had to re-run Gephi (why?), so hasn't compiled this yet. @kdahlquist said to stop at 10 random networks and move on to compiling and analyzing the data.

kdahlquist commented 7 years ago

@khorstmann has posted the Gephi output in the DahlquistLab repo here:

https://github.com/kdahlquist/DahlquistLab/tree/master/data/15-gene_networks_analysis

from her e-mail message to @kdahlquist:

" So two weeks ago when I had started, Gephi wasn't working properly for all the stats at first (taking a long time to load, skipping a couple of stats, etc.). Once I was able to get it to work, I was only able to run a few, and then at the meeting, I requested the graphical layout of how the GRNsight nodes should be laid out so it stayed consistent. When I began last week, I decided to rerun the couple of families I already had so I could a) ensure that Gephi was working the same since the previous buggy week and b) get the GRNsight visualization consistent across all of them, which I now know was not a priority, but I'm glad I did since I have to run the random excel through GRNsight in order to get the GraphML anyways, so might as well visualize while I'm there in case it's needed later. "

kdahlquist commented 7 years ago

Copied @khorstmann's comment on the task list over to here for future reference:

"Compiled the centrality stats (betweenness, harmonic, closeness, and eccentricity) together into excel file with multiple sheets in GRNmap repository-> data-> 15 genes to make analysis of it easier.

Also noticed two of the random networks were the same (Networks 4 & 5) so added random network 11 to the mix to have 10 unique random networks"

As discussed in the meeting, we are still unclear on what Gephi does with weighted edges when computing the centrality measures. To help sort this out:

  1. pick one network and run the stats both unweighted and weighted to see if they are different;
    • If they are the same, then we know that Gephi isn't doing anything different with the weights
  2. If they are different, manually convert all the negative weights to positive weights and then run it.
  3. If they are the same, we know that Gephi is not appropriately treating negative weights. If they are different, we don't know for sure that Gephi is appropriately treating negative weights, but we are a little more sure.

To make sense of the meaning of the centrality measures, we need to know how they are computed. I suggest that you try computing one yourself and see if you get the same answer as the program.

If there is not sufficient documentation in the program to know what the calculations are, then you should try to contact the development team and ask.

khorstmann commented 7 years ago

Ran Gephi on Random networks 12-20 and uploaded them to repository. Next 10 will be done next week. The PowerPoint consolidation of the networks and stats can be found here Random_Networks.pptx

kdahlquist commented 7 years ago

This issue is also related to #329 and #335.

We need to make sure that we have a PowerPoint, the raw Excel output saved, and a spreadsheet that compiles each statistic for each network.

kdahlquist commented 7 years ago

Closing this issue now.