Closed kdahlquist closed 7 years ago
Put the workbooks (in progress) in the Dahlquist Lab repository here: https://github.com/kdahlquist/DahlquistLab/tree/master/data/GRNmap_input_workbooks
Some instructions can also be found on the Microarray Analysis Workflow on OWW
We want to make a "gold standard" set of instructions here on how to make the input workbooks, so please feel free to update that page.
Note when using the Neymotin data, use the Systematic name (e.g. YKL134C), to be absolutely sure you have the right gene. Sometimes synonyms get used and I can see that some gene names have problems due to some datatype conversion issues in Excel.
I have begun generating the input sheets for the dHAP4, dGLN3, and dZAP1 families of networks. The current versions of these input sheets can be found in the Dahlquist Lab Repository. My progress is summarized below:
-dHAP4: all sheets except for "dhap4_log2_expression" and "optimization_parameters" have been carefully generated. -dGLN3: the "network" sheet has been generated and approved. -dZAP1: the "network" sheet has been generated, but approval was not received. This network requires examination for validity before proceeding.
I talked with @bklein7 about how to format the input sheets, recommending the following:
Completed input sheets for the dHAP4, dGLN3, and dZAP1 families of networks are now available in the Dahlquist Lab Repository. Upon finishing these input sheets, I did run into some lingering questions:
I'll answer number 2 from your previous comment: leave all 5 instances there; there is no need to have the same number of replicates per timepoint.
For number 3, I don't think there is a preferred font/size. I do think it looks better if it is consistent from sheet to sheet. Microsoft changed the default from Arial 10 to something else (Calibri?) with a recent version change; I usually make my workbooks Arial 10, but I think it's better to just use something consistent.
For number 1, I think it's worth reviewing in the meeting. Alpha = 0.002 should be right for these networks because that is what we determined from the L-curve analysis last semester.
See issue #119 for a screenshot of what that should look like. Will @bklein7 make sure our documentation is in conformance to this?
@kdahlquist needs to review the dZAP1 network.
@bklein7 and @Nwilli31 will swap when ready for cross-check.
I have worked on this issue and here are my notes:
16-genes_27-edges_BK-KD-dZAP1-fam_Sigmoid_estimation.xlsx
. I pasted the network into the "network" and "network_weights" sheets, and pasted the list of genes into the other worksheets, but did not put in the data. This network has 16 genes and 27 edges; the next gene to delete would have been MSN4 and it seemed arbitrary to delete it and keep MSN2, so I kept both.GRN_Gene_Lists.xlsx
so that what is listed in there matches what is in the input workbooks for each strain. There was a discrepancy between this file and the wt input workbook, so I changed it to make it match the input workbook. I also copied and re-pasted from the other input workbooks (I didn't specifically check for problems). I also made a new worksheet that has all the strain gene lists next to each other for comparison.So, there's a little more work to do to get the expression data for the rest of the strains, I'm afraid.
I have completed the regenerated dZAP1 network input sheet and added all missing expression data to the dHAP4 family input sheet. These updated input sheets have been uploaded to the Dahlquist Lab Repository. I have yet to add missing expression data for the dGLN3 family input sheet or update input sheet creation protocol. These tasks will be completed during the week of 10/17.
While cross-checking input sheets this week, I misadvised @Nwilli31 to delete expression data for networks other than wt or the particular strain from which the network was derived. She should have the missing data available in previous versions the input sheets for wt and dCIN5.
I've re-uploaded the input sheets with the new calculated degradation rates and the additional strain's expression data.
Besides completing these workbooks, please make the unweighted GRNsight graphs for each, laying them out on one consistent grid. You will also do this after the first model runs from #265.
I have uploaded a new version of the dGLN3 family input sheet that includes the proper expression data to the Dahlquist Lab Repository. Thus, all five input sheets now include the previously missing expression data.
I briefly looked over @Nwilli31's input sheets and noticed her wt input sheet is missing expression data for dHMO1. Also, the gene names in the individual worksheet tabs should not be capitalized in the wt input sheet.
I finally had a chance to re-create the dCIN5-family network. I am attaching a workbook that has the entire family of networks. Of interest are the last two sheets, the 17-gene_32-edge and 14-gene_25-edge networks. It turns out if you remove MCM1 from the 17-gene network, you also lose ACE2 and ZAP1 (the disconnected ones from before). I'm a little torn between using the 17- or 14-gene network, maybe we can run both? There are other minor differences between this and what Kayla had, so double-check everything when constructing the new network(s).
The file with the gene lists for the various networks will also need to be updated based on this.
So @bklein7 has vetted the dCIN5 input workbooks, so this is closable.
@bklein7 and @Nwilli31 will now work on generating 5 input workbooks for the 5 database-derived networks that the team started last year. The five networks are:
Each will create 2 or 3 networks and then swap and double-check each other.
Instructions for how to format the input worksheets are found on the GRNmap wiki here.
We want to carefully check each part of the input workbook.