Closed kdahlquist closed 7 years ago
Update for week:
Immediate priority is #301: remove strikeouts and make a final reading before submitting.
I just want to record that MSE is not the same as LSE as discussed in the meeting.
It would be a good idea to review the Dahlquist et al. 2015 paper to understand the math behind the LSE.
For this week's session, we determined which variables need to be tested to ensure that the LSE function works as intended. From this, I've created templates for testing the lse and gLSE routines so that we only need to worry about figuring out what the expected values are for those routines.
Task list has been updated for this week; see issue for updates.
For creating the LSE tests, see the comments on #313.
Tl;dr: Testing the LSE function would require assuming that the current output is correct so that when we implement the compressMissingData function, the current output should not be changed. However, what this would mean is probably generating a new set of output for different use cases just like the "sixteen_tests" excel files, which are already different from the current output of the code. @cazinge suggested working in parallel in implementing the new data structure.
For the other issues:
Task list remains unchanged. I noted at the meeting if you are planning to take off early next Friday for Spring Break, then you need to arrange to do your research hours earlier in the week.
@cazinge @trixr4kdz @jtorre39 please let me know when you're up-and-running in the UH lab.
@bengfitzpatrick We're set to go!
@bengfitzpatrick stopped by during our session and gave us specifics on how to approach our tests. Here is the result from our discussion.
For this week's meeting, @bengfitzpatrick helped us map out what needed to be in the gLSE tests. More specifically, two of our test cases were taken from:
Note that in both cases, the L = 0.
The next steps include working on more test cases that iterate through the test cases we already made and working out the answers by hand.
For reference, this is the equation that we will use for determining what the L would be based on our changes to the test data:
where nData = (# of flasks) x (# of timepoints) x (# genes) x (# of strains)
Main work at the meeting #313, with #310 and #311 to follow after #313.
I wrote some comments on new tests for issue #313.
I have now started working on #310 on integrating the new data structure since the new tests for gLSE have been merged (PR #339).
Minor question, how does the SSE calculation change with the new data structure?
microData will be changed to expressionData with the fields raw
, compressed
, strain
, avg
, deletion
, t
, stdev
microData(index).data
in https://github.com/kdahlquist/GRNmap/blob/48da7b9d78a4640a38b20a64220967f489501706/matlab/readInputSheet.m#L61 becomes expressionData(index).raw
expressionData(index).data
in https://github.com/kdahlquist/GRNmap/blob/beta/matlab/compressMissingData.m#L4 becomes expressionData(index).compressed
Courtesy of @bengfitzpatrick, a break down of the loop within GLSE
I just wanted to note that at the meeting today, we advised this course of action:
For the points mentioned above by @kdahlquist:
So I ran the ALL the tests this time, even MSETest (which still takes ~30-ish minutes for just that 1 test unfortunately) and they all pass.
One thing that I observed when I ran the code with the sixteen_tests
Excel files, however, is that the precision error we had with timepoint 1.2
came back. As a result, GRNmap will think that the third replicate for timepoint 1.2
is a different timepoint (i.e., we have 3 replicates for t=0.4, 3 reps for t=0.8, 2 reps for t=1.2, 1 rep for t=1.2...0001, and 3 reps for t=1.6). In this case, GRNmap will produce a warning since it thinks that there is only 1 replicate for t=1.2...0001.
Closing out the work of last semester!
Tasks for @trixr4kdz, @cazinge, and @jtorre39:
Tasks for @kdahlquist: