Coding team tasks for the rest of spring, 2017

kdahlquist commented 7 years ago

Tasks for @trixr4kdz, @cazinge, and @jtorre39:

[x] create tests for LSE function #313 (priority 0)
[x] meet up with @bengfitzpatrick to look at #310, begin converting LSE to new data structure (priority 0.5)
[x] work on #311, converting existing test cases to the new data structure (priority 0.5)

Tasks for @kdahlquist:

[ ] #323, different results on different computers
[ ] review requested for 0% release #312, waiting on resolution of #323.

azinge commented 7 years ago

Update for week:

There's a draft for the 0% release (v1.6) for the current week up on GRNmap's releases page. Approval should be given during this week's meeting before we publish.
We've revised the draft and seen @kdahlquist's comments; upon @dondi and @bengfitzpatrick's review we will submit to the symposium.
We've partially reviewed the changes that we'll need to make in MSETest, as the file should be broken into multiple smaller unit tests from the large functional one that it is now. In theory, however, it's existing state should be able to be used as one large test just to ensure that LSE is working correctly; then we would iterate upon its tests and improve its quality after converting to the new structure. @bengfitzpatrick's feedback at this week's meeting would be appreciated. (Side note: runGRNstructSimulation does not seem like its being used, appearing within some commented out code. Should we remove the file and clean up the comments in callTests?)

dondi commented 7 years ago

Immediate priority is #301: remove strikeouts and make a final reading before submitting.

kdahlquist commented 7 years ago

I just want to record that MSE is not the same as LSE as discussed in the meeting.

It would be a good idea to review the Dahlquist et al. 2015 paper to understand the math behind the LSE.

trixr4kdz commented 7 years ago

For this week's session, we determined which variables need to be tested to ensure that the LSE function works as intended. From this, I've created templates for testing the lse and gLSE routines so that we only need to worry about figuring out what the expected values are for those routines.

kdahlquist commented 7 years ago

Task list has been updated for this week; see issue for updates.

trixr4kdz commented 7 years ago

For creating the LSE tests, see the comments on #313.

Tl;dr: Testing the LSE function would require assuming that the current output is correct so that when we implement the compressMissingData function, the current output should not be changed. However, what this would mean is probably generating a new set of output for different use cases just like the "sixteen_tests" excel files, which are already different from the current output of the code. @cazinge suggested working in parallel in implementing the new data structure.

For the other issues:

I also put a comment on #310 for interpreting what the math does in gLSE.
I did a little bit of tidying up on my fork with regards to the test suite. I'll just go make a pull request now so you can check. I realized that the tests were getting a bit disorganized so I grouped them into separate folders depending on what I thought the tests were doing.

kdahlquist commented 7 years ago

Task list remains unchanged. I noted at the meeting if you are planning to take off early next Friday for Spring Break, then you need to arrange to do your research hours earlier in the week.

bengfitzpatrick commented 7 years ago

@cazinge @trixr4kdz @jtorre39 please let me know when you're up-and-running in the UH lab.

im-deepfriedwater commented 7 years ago

@bengfitzpatrick We're set to go!

im-deepfriedwater commented 7 years ago

@bengfitzpatrick stopped by during our session and gave us specifics on how to approach our tests. Here is the result from our discussion. Fitzpatrick's Whiteboard

Young Eddie's Whiteboard

trixr4kdz commented 7 years ago

For this week's meeting, @bengfitzpatrick helped us map out what needed to be in the gLSE tests. More specifically, two of our test cases were taken from:

Note that in both cases, the L = 0.

trixr4kdz commented 7 years ago

The next steps include working on more test cases that iterate through the test cases we already made and working out the answers by hand.

trixr4kdz commented 7 years ago

For reference, this is the equation that we will use for determining what the L would be based on our changes to the test data:

16935544_1822496211346048_1130146500_o

where nData = (# of flasks) x (# of timepoints) x (# genes) x (# of strains)

dondi commented 7 years ago

Main work at the meeting #313, with #310 and #311 to follow after #313.

trixr4kdz commented 7 years ago

I wrote some comments on new tests for issue #313.

I have now started working on #310 on integrating the new data structure since the new tests for gLSE have been merged (PR #339).

trixr4kdz commented 7 years ago

Minor question, how does the SSE calculation change with the new data structure?

kdahlquist commented 7 years ago

microData will be changed to expressionData with the fields raw, compressed, strain, avg, deletion, t, stdev

microData(index).data in https://github.com/kdahlquist/GRNmap/blob/48da7b9d78a4640a38b20a64220967f489501706/matlab/readInputSheet.m#L61 becomes expressionData(index).raw

expressionData(index).data in https://github.com/kdahlquist/GRNmap/blob/beta/matlab/compressMissingData.m#L4 becomes expressionData(index).compressed

im-deepfriedwater commented 7 years ago

Courtesy of @bengfitzpatrick, a break down of the loop within GLSE 2017-04-05 12 18 48

kdahlquist commented 7 years ago

I just wanted to note that at the meeting today, we advised this course of action:

Make sure the current LSE tests with the current data structure is functional (I believe that @jtorre39 confirmed that this was the case.
Make the modifications to the LSE code.
Run the test with the complete (no missing) data test used on the original LSE code and verify that you get the same results.
Write the additional tests needed to capture all the missing data cases that we previously listed.

trixr4kdz commented 7 years ago

For the points mentioned above by @kdahlquist:

Finally finished modifying the gLSE function
I ran the tests and they are still passing (woot!)
We just need to make more tests for item 4

trixr4kdz commented 7 years ago

So I ran the ALL the tests this time, even MSETest (which still takes ~30-ish minutes for just that 1 test unfortunately) and they all pass.

One thing that I observed when I ran the code with the sixteen_tests Excel files, however, is that the precision error we had with timepoint 1.2 came back. As a result, GRNmap will think that the third replicate for timepoint 1.2 is a different timepoint (i.e., we have 3 replicates for t=0.4, 3 reps for t=0.8, 2 reps for t=1.2, 1 rep for t=1.2...0001, and 3 reps for t=1.6). In this case, GRNmap will produce a warning since it thinks that there is only 1 replicate for t=1.2...0001.

kdahlquist commented 7 years ago

Closing out the work of last semester!

kdahlquist / GRNmap

Coding team tasks for the rest of spring, 2017 #314