Data Analysis Team run L-curve analysis

kdahlquist commented 8 years ago

I wanted to update you that my L-curve is still running. It got past the point that crashed, so it looks like it will work with make_graphs = 0.

You should be able to proceed with the beta branch code.

kdahlquist commented 8 years ago

Update: the L-curve analysis finished overnight on the computer in my office. It took about 13 hours to run, judging by the timestamp on the files. I recommend that you run them overnight. If you use Seaver 120, make sure that you put your input file on the T: drive. That drive will not get erased if the computer is restarted. Note that the computer lab is being used on MWF mornings at 9:00, so you will need to make sure that your L-curves are completed by then.

GraceJohnson commented 8 years ago

When running the model, should we fix b and p or estimate?

kdahlquist commented 8 years ago

Let's estimate all parameters, w, b, and P this time. That's what I did.

kdahlquist commented 8 years ago

I forgot to mention to everyone that you have the option of using my lab computers to run the L-curve analysis.

kdahlquist commented 8 years ago

When @GraceJohnson discovered the computers locked on Sunday, they were undergoing software installation and maintenance. I will be notified in the future if there is going to be significant downtime, it was basically a coincidence that this was happening just when we needed them this weekend.

kdahlquist commented 8 years ago

I ran the L-curve with b fixed and it only took about 3.5 hours. I still want everyone to do it without fixing b, though.

GraceJohnson commented 8 years ago

@kdahlquist, my models have been running since 1 pm on Monday. The smallest network currently on run 15 (but has been on it for about 20 hours), and the largest network is only on run 9. It looks like it's unlikely for them to finish before Seaver 120 is used for the class tomorrow at 9am. Do you know if any classes that use Seaver 120 need to use MATLAB? If not, then it's probably okay to keep the models running for the next couple day or so, just minimizing MATLAB and writing a note instructing people to not close the program. @maggie-oneil 's runs are also still in progress.

khorstmann commented 8 years ago

Dr. Berube has a class in SEA 120 at 9 AM. I had a class with him and he still knows who I am, so I shot him a quick email to emphasize to his students if they're in here tomorrow to please not touch Matlab and not log out of "Student" user. We left many notes on the computer and whiteboards, but hoping with the professor being aware may allow students to actually understand to seriously not close Matlab. I will also bring up to the biostats students and professors coming in at 1:10 to not touch anything if programs are still running (hopefully should be done by then, but no promises at this point). Hopefully this may leave everything undisturbed.

kdahlquist commented 8 years ago

Unfortunately, the classes will have priority over the computers; I can't guarantee that your models will be undisturbed. I think you need to come to my research lab and get it running on my lab computers. There are only two computers, but at least they can be undisturbed for the duration. My lab computers have multi-cores, so there is a trick that you can do to run multiple instances of Matlab one on each core, but you will have to Google how to do it because I don't remember how.

The other option is Dr. Fitzpatrick's computer lab in the math department. Just double-check the version of Matlab on them before you start.

tessaam commented 8 years ago

dGLN3 L-curve Analysis: http://openwetware.org/wiki/DGLN3_L-curve_analysis_TM

kdahlquist commented 8 years ago

I just heard from Dr. Berube that he only has six students in his class, so as long as there are six open computers for his class (like the front row), it will be OK to run stuff on the others.

khorstmann commented 8 years ago

Talked to Berube and his class did not even use SEA 120 this morning (he reserves it every MWF at that time just in case), so no students used the room this morning. Asked biostats classmates to not disturb MATLAB, and all sheets are still running, undisturbed, after class.

khorstmann commented 8 years ago

dZAP1 L-Curve status: http://openwetware.org/wiki/DZAP1_L-curve_analysis_KH

bklein7 commented 8 years ago

I created a command sequence for use in R that creates the L-curve graphs and annotates specific points with associated alpha values: http://openwetware.org/wiki/Graphing_L-Curves_in_R. Samples are present at the bottom of the page. With some adjustment, perhaps we could use this to quickly plot L-curves for future runs.

kdahlquist commented 8 years ago

@bklein7 is in the process of compiling a PowerPoint with all of the L-curve plots that we ran this week. Each slide has one L-curve with the following additional information:

which family the network comes from (dGLN3, dHAP4, dZAP1)
number of genes and edges
which strain's data was used (e.g. wt, dCIN5, dGLN3, dHAP4, dHMO1, dZAP1)

@kdahlquist will send to @bengfitzpatrick to determine which alpha to choose. Based on our previous paper and his e-mail to me, it seems that we are choosing one that is not exactly in the bend itself, but a little to the right on the curve.

We might run another L-curve with a smaller number of total alphas, but with higher resolution in the region of interest on the curve.

We will also plot the parameter values (index vs. magnitude) to see the relationship of the values to the alpha, but I'm not sure whether to do this now or after another run of the L-curve.

Some runs have not finished yet, so we plotted the data we had so far. We probably don't need the data from the 15th or 16th alphas in the list anyway to make our determination of what to do.

@khorstmann's plots are very strange, so she and @kdahlquist will look into whether there are problems with the input sheets.

kdahlquist commented 8 years ago

@bklein7 compiled the L-curve data from last week's runs. Here it is.

LcurveAnalyses_20160205.pptx

kdahlquist commented 8 years ago

@bengfitzpatrick says "good job" on the L-curves!

He also says: This is more of an art than a science, and you might notice from the LSE axis that we're already pretty "zoomed in." [meaning we don't need to do another run with more alphas] My best judgment would be that alpha = 0.002 would work well.

He would like you to plot the parameters for w, P, and b for alphas = 0.01, 0.008, 0.005, 0.002, 0.001.

This is replicating figure Figure 9 from our paper: http://link.springer.com/article/10.1007/s11538-015-0092-6/fulltext.html, but you can just do it as a bar chart in Excel, if that's easier like in the Spring 2015 Biomathematical Modeling class ( @khorstmann knows what I am talking about).

khorstmann commented 8 years ago

Ran updated input sheets (25 and 33 genes) @kdahlquist sent after looking over the input sheets after producing not-so-L-curves.

Encountered some issues with the 33 gene. On 3 different computers, got 2 different errors. "Error using barrier, error using fmincon" and "Error using xlsread (line 247) Worksheet 'degradation_rates' not found".
While starting to copy and paste over into a new sheet to make sure the numbers were being read properly, noticed a blank cell in a sheet that I had fixed before but mustve redownloaded the blank version. Was able to start run with no errors, will be checking in to make sure runs to completion.

khorstmann commented 8 years ago

L-curve of the 10 alphas of 33 gene network can be found here: http://openwetware.org/wiki/Image:ZAP1_Lcurve_analysis_33_genes_98_edges.pdf

Hopefully current input sheet running will finish through all 16, as this one, the last 4 points hardly changed in LSE values (occassionally out to the 5th or 6th decimal) resulting in them all stacking on top of each other in graph

kdahlquist commented 8 years ago

So we want to get everyone all caught up with completing the L-curves and the plots with the parameter value comparisons before we close this issue.

However, we think that for the future an alpha value of 0.002 is going to work for us. We will use this same alpha value for all future runs so that there is a fair comparison between networks.

kdahlquist commented 8 years ago

@GraceJohnson, I'm moving our conversation over to the issue here. I've passed your file (34-genes_102-edges_GJ-dHAP4-fam_strains-added_L-curve_parameter-comparison.xlsx) along to @bengfitzpatrick.

Since all of the nodes and edges in the small network should also be there in the large network, it would be interesting to do a side by side comparison of the plots. For example, put the bar chart for the smallest network in a PowerPoint presentation on the same slide with the bar chart from the largest network, where you are only showing the parameter values that are also in the small network. It should be a simple matter to create the smaller chart by deletion from the larger one.

kdahlquist commented 8 years ago

@bengfitzpatrick says that @GraceJohnson should re-run the alpha = 0.002 case since they are out of line with the other values.

bengfitzpatrick commented 8 years ago

Make sure the initial guesses in the alpha =0.002 case are the final results of the alpha = 0.005 case. In fact, a good thing to do overall is to verify that the L-curve code is properly preparing the initial guesses... initial guess sheets for each alpha should be the optimized results from the previous (larger) alpha.

khorstmann commented 8 years ago

Reposting most recent "stacked" L-curves (handful of LSE values were same for different alphas) as the previous issue was closed. http://openwetware.org/images/5/50/ZAP1_l-curve_25-genes_fixed_2-19.pdf http://openwetware.org/images/d/d5/ZAP1_l-curve_33-genes_fullalphas.pdf

kdahlquist commented 8 years ago

Having discovered that the anomalous data from the alpha = 0.002 is due to problems with the input sheet #185, @GraceJohnson is re-running the alpha = 0.002 alone after she fixed the input sheet.

Now that there is a known bug with the L-curve script, we are going to stop running L-curves for now and go on the assumption that alpha = 0.002 will work for all the networks.

kdahlquist commented 8 years ago

@GraceJohnson, I posted on your OpenWetWare page: http://www.openwetware.org/wiki/Katherine_Grace_Johnson_Electronic_Lab_Notebook#February_27.2C_2016 that the run completed late last night and I grabbed your files and uploaded them to OpenWetWare.

In the future, you need to run things off of the T: drive. Stuff on the T: drive will not be deleted when the machine is restarted.

GraceJohnson commented 8 years ago

I have confirmed there is zero difference between the optimized w's, p's, and b's for the forward run using alpha=0.002 (posted in the previous comment) and my previous L-curve analysis for 0.002. Additionally, in the output file, the "optimized_threshold_b" sheet still contains a few added zeros in column C. I have not identified any other issues with the output files. This is the same error that occurred with the L-curve output sheets.

kdahlquist commented 8 years ago

This has been completed for the Spring 2016 semester. Bugs associated with this are reported in issue #185 .

kdahlquist / GRNmap

Data Analysis Team run L-curve analysis #172