kdahlquist / GRNmap

Gene Regulatory Network modeling and parameter estimation
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Incorrect Compression within the compressed Field of log2FC #380

Closed im-deepfriedwater closed 6 years ago

im-deepfriedwater commented 6 years ago

The data cells in column 4 are incorrectly being appended to column 4 instead of column 3. This pushes the data cells that should be in column 4 to column 5. This bug might possibly be related to the inputsheet since it is a bit old or some oversight we did not catch. The pictures below are two strains within the same GRNstruct with the aforementioned erroneous behavior.

im-deepfriedwater commented 6 years ago

log2fc 1 _compressed

im-deepfriedwater commented 6 years ago

log2fc 2 compressed

im-deepfriedwater commented 6 years ago

The input sheet which caused this behavior. It could possibly outdated which might be causing this. If not, it is probably worth investigating within compressMissingData.m

LSETest input sheet

im-deepfriedwater commented 6 years ago

The following is how log2FC(1).compressed SHOULD look

log2fc 1 _compressed_correct

im-deepfriedwater commented 6 years ago

This was done on a GRNstruct defined in the test file, AKA without reading an input sheet of equivalent data.

im-deepfriedwater commented 6 years ago

@cazinge made note that the pushes he made to his fork will potentially solve this issue since he worked on stuff related to input.

im-deepfriedwater commented 6 years ago

Unfortunately his changes didn't seem to affect this bug. It's possibly rooted within readInputSheet or convertToNestedStructure or compressMissingData. The current plan for this is to write tests in our current tests for convertToNestedStructure test suite which call on readInputSheet and validate that it was converted properly.

im-deepfriedwater commented 6 years ago

Ran GRNmodel on a separate input sheet and it still produced a similar issue when checking the compressed field in Matlab's debug mode. It is not input sheet specific.

im-deepfriedwater commented 6 years ago

Created a new test case in the test for CompressMissingData. In the test, raw data is gotten from a new test input sheet I created for convenience as opposed to being predefined in the test script. As predicted, it is currently failing and bug investigations are still ongoing.

im-deepfriedwater commented 6 years ago

image

The indx in the t struct seems to be off. It should instead be [1 2 3] [4 5 6] [7 8 9] [10 11 12]

im-deepfriedwater commented 6 years ago

The following is from my test case for illustrative purposes, actual vs expected.

image

im-deepfriedwater commented 6 years ago

The indx problem seems to imply the problem might lie in readinputsheet? I also have to investigate convertToNestedStructure to see how the compressed struct is getting set.

dondi commented 6 years ago

One immediate thing to try is to create, from scratch (i.e., not basing on a pre-existing file), an input sheet with this data. Use integers for the time points at first. Test the correctness of reading this "clean" input sheet. Then test correctness with non-integer time points. This will help us isolate whether the bug appears on clean vs. pre-existing files, and on integer vs. non-integer time points.

kdahlquist commented 6 years ago

@johnllopez616 is going to focus on this issue first.

jlopez616 commented 6 years ago

Will the issue be assigned to me in that case? I was just made aware of it thanks to Justin, and it is not in my list of issues at the moment.

im-deepfriedwater commented 6 years ago

Good and then some what bad news!

Good news is that making a clean test sheet seems to have worked. I have some more experimenting to do and I have to clean up the input sheet but the data cells were in their proper place. It also hints that the fix is relatively simple and does not require any changes to the code.

Bad news is... we have many many test sheets to fix if that is the case. But it's doable and at least a simple solution.

Will try to do more investigating before the meeting this Friday if possible!

kdahlquist commented 6 years ago

Sorry @johnllopez616, I made a typo. This issue is for @jtorre39. Both of your names start with "j" and I mis-read the autofill. Sorry for the confusion.

im-deepfriedwater commented 6 years ago

Made a formalized clean test input sheet by using the same values from an input sheet that was failing the test and the test is now passing with the clean sheet. Confirms that there is some strange excel formatting that is causing this bug.

Will look into the differences between the existing sheet and the clean sheet to investigate further.

im-deepfriedwater commented 6 years ago

Found the bug. It is in fact related to the time point designation bug @kdahlquist mentioned before.

Tracked the issue down to a for loop within readinputsheet which had:

find(reps == expression_timepoints(jj)); (at this point the value of jj was 3"

reps and expression_timepoints(jj) would both have a value of 1.200. But Matlab said they were not equivalent. In the excel spreadsheet the timepoints were displayed as 1.2 but apparently some small decimal numbers had hidden themselves within.

Deleting the timepoints for 1.2 and rewriting them as 1.2 fixed it. It might be necessary to reopen the test file audit #361 to retype all the timepoints within the test sheet. It's also possible to make a fix to readinputsheet to tell xlsread when reading the timepoints to round the timepoints to a certain decimal point if we want to account for this issue in the future.

I'll defer to the professors @kdahlquist @bengfitzpatrick @dondi for what solution we should take moving forward.

kdahlquist commented 6 years ago

I'd rather we fix this at the level of the input sheet. I'm a little squeamish about the program rounding a constant provided by the user.

Ideally, we would have a test that checks the input workbook and alerts the user to this potential problem and asks them to fix it before running the model.

kdahlquist commented 6 years ago

Conclusion is that we need to fix the input workbooks in the repository to fix the problem. We will hold off on implementing the input workbook check because that is a much larger issue to address.

@jtorre39 will work on correcting the input workbooks and close this issue when he is done.

im-deepfriedwater commented 6 years ago

Audited all the test files (again) and currently running the test suite!

im-deepfriedwater commented 6 years ago

Test suite is passing, and code from fork is merged into beta! Closing as instructed by @kdahlquist.