Closed im-deepfriedwater closed 6 years ago
The input sheet which caused this behavior. It could possibly outdated which might be causing this. If not, it is probably worth investigating within compressMissingData.m
The following is how log2FC(1).compressed SHOULD look
This was done on a GRNstruct defined in the test file, AKA without reading an input sheet of equivalent data.
@cazinge made note that the pushes he made to his fork will potentially solve this issue since he worked on stuff related to input.
Unfortunately his changes didn't seem to affect this bug. It's possibly rooted within readInputSheet or convertToNestedStructure or compressMissingData. The current plan for this is to write tests in our current tests for convertToNestedStructure test suite which call on readInputSheet and validate that it was converted properly.
Ran GRNmodel on a separate input sheet and it still produced a similar issue when checking the compressed field in Matlab's debug mode. It is not input sheet specific.
Created a new test case in the test for CompressMissingData. In the test, raw data is gotten from a new test input sheet I created for convenience as opposed to being predefined in the test script. As predicted, it is currently failing and bug investigations are still ongoing.
The indx in the t struct seems to be off. It should instead be
[1 2 3] [4 5 6] [7 8 9] [10 11 12]
The following is from my test case for illustrative purposes, actual vs expected.
The indx problem seems to imply the problem might lie in readinputsheet? I also have to investigate convertToNestedStructure to see how the compressed struct is getting set.
One immediate thing to try is to create, from scratch (i.e., not basing on a pre-existing file), an input sheet with this data. Use integers for the time points at first. Test the correctness of reading this "clean" input sheet. Then test correctness with non-integer time points. This will help us isolate whether the bug appears on clean vs. pre-existing files, and on integer vs. non-integer time points.
@johnllopez616 is going to focus on this issue first.
Will the issue be assigned to me in that case? I was just made aware of it thanks to Justin, and it is not in my list of issues at the moment.
Good and then some what bad news!
Good news is that making a clean test sheet seems to have worked. I have some more experimenting to do and I have to clean up the input sheet but the data cells were in their proper place. It also hints that the fix is relatively simple and does not require any changes to the code.
Bad news is... we have many many test sheets to fix if that is the case. But it's doable and at least a simple solution.
Will try to do more investigating before the meeting this Friday if possible!
Sorry @johnllopez616, I made a typo. This issue is for @jtorre39. Both of your names start with "j" and I mis-read the autofill. Sorry for the confusion.
Made a formalized clean test input sheet by using the same values from an input sheet that was failing the test and the test is now passing with the clean sheet. Confirms that there is some strange excel formatting that is causing this bug.
Will look into the differences between the existing sheet and the clean sheet to investigate further.
Found the bug. It is in fact related to the time point designation bug @kdahlquist mentioned before.
Tracked the issue down to a for loop within readinputsheet which had:
find(reps == expression_timepoints(jj));
(at this point the value of jj was 3"
reps and expression_timepoints(jj) would both have a value of 1.200. But Matlab said they were not equivalent. In the excel spreadsheet the timepoints were displayed as 1.2
but apparently some small decimal numbers had hidden themselves within.
Deleting the timepoints for 1.2 and rewriting them as 1.2
fixed it. It might be necessary to reopen the test file audit #361 to retype all the timepoints within the test sheet. It's also possible to make a fix to readinputsheet to tell xlsread when reading the timepoints to round the timepoints to a certain decimal point if we want to account for this issue in the future.
I'll defer to the professors @kdahlquist @bengfitzpatrick @dondi for what solution we should take moving forward.
I'd rather we fix this at the level of the input sheet. I'm a little squeamish about the program rounding a constant provided by the user.
Ideally, we would have a test that checks the input workbook and alerts the user to this potential problem and asks them to fix it before running the model.
Conclusion is that we need to fix the input workbooks in the repository to fix the problem. We will hold off on implementing the input workbook check because that is a much larger issue to address.
@jtorre39 will work on correcting the input workbooks and close this issue when he is done.
Audited all the test files (again) and currently running the test suite!
Test suite is passing, and code from fork is merged into beta! Closing as instructed by @kdahlquist.
The data cells in column 4 are incorrectly being appended to column 4 instead of column 3. This pushes the data cells that should be in column 4 to column 5. This bug might possibly be related to the inputsheet since it is a bit old or some oversight we did not catch. The pictures below are two strains within the same GRNstruct with the aforementioned erroneous behavior.