kdahlquist / GRNmap

Gene Regulatory Network modeling and parameter estimation
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Test files Audit #361

Closed kdahlquist closed 6 years ago

kdahlquist commented 7 years ago

We need to perform an audit of the "test_files" directory.

For any files we keep, we need to check that

Assigning this a 0.5 priority because other issues have priority right now. However, when we tackle this, we want to be thorough and detail-oriented so that we don't have to do it again moving forward. Paying off some more technical debt.

im-deepfriedwater commented 7 years ago

Within GRNmap/test_files/matlab_codes/sampleTests there are matlab test files that are useful as examples for beginning to write tests, however we are beyond that level as we already have a whole test suite to look back on for examples. These are subject for removal.

im-deepfriedwater commented 7 years ago

GRNmap\test_files\matlab_codes\calculationTests\newLSETests\GeneralLSETest.m

Is a remnant from last semester as we tried to work on it in tandem. It is no longer needed as a correct and completed version exists at GRNmap\test_files\matlab_codes\calculationTests\GeneralLSETest.m

im-deepfriedwater commented 7 years ago

Directories left for inspection for keeping

im-deepfriedwater commented 7 years ago

@kdahlquist is there a heuristic I could use for checking which test files are no longer needed?

im-deepfriedwater commented 6 years ago

folders of test functions in matlab_codes I am looking through

im-deepfriedwater commented 6 years ago

Test files subject for removal/discussion:

kdahlquist commented 6 years ago

We can talk about this list of files at the meeting.

dondi commented 6 years ago
dondi commented 6 years ago

Bullet-by-bullet decision on audit

dondi commented 6 years ago

Also add a README.md at the test_files folder to document the contents for future GRNmappers.

im-deepfriedwater commented 6 years ago

Removed the files as detailed by the discussion of the checklist I posted. Next, is to create an issue to finish the LSETest.m and also finishing checking the rest of the test files I couldn't get to from the previous work sessions.

UPDATE: Issue for LSETest.m created at #376

im-deepfriedwater commented 6 years ago

I've also added a readme.md within the test_files folder. There was already instructions on how to run the test suite within matlab_codes that show up as a readme.txt. I will most likely go in and reformat it as a markdown and leave it in the /test_files/matlab_codes folder and make note of its availability from the readme.md in test_files.

im-deepfriedwater commented 6 years ago

2nd Wave of folders/files subject for removal or are of interest:

im-deepfriedwater commented 6 years ago

With the 2nd wave the first part of the test file audit will have been completed. The next step is to double check that the input workbooks we keep are valid and conform to our new format as denoted in the initial issue.

kdahlquist commented 6 years ago

Round 2 test file audit notes:

im-deepfriedwater commented 6 years ago

I've removed the appropriate files, what's next is to take care of these left over issues.

im-deepfriedwater commented 6 years ago

Couldn't find an issue related to the data sheet in test_files/perturbation_tests/with_manual_calculations that explained its origins. Removing as of now.

im-deepfriedwater commented 6 years ago

Spent quite a bit of time crawling through commit histories. The test_files/perturbation_tests/with_manual_calculations/readme.txt lists files that I cannot find anywhere and have not seen it while going about 2 years back within the commit history. The readme.txt does reference graphs from the plots folder that we denoted was subject for removal.

Overall, I believe the readme should be removed as after reading through and not finding other files, it does not seem to have a purpose in the repository.

im-deepfriedwater commented 6 years ago

@dondi For the README.md that goes within test_files am I listing one by one the purposes of each folder?

For example I'd write,

initialize_arrays_test folder provides 3 test input sheets used by test_files/matlab_codes/dataStructureTests/InitializeArraysTest.m

deleted_strains_test folder provides test input sheets used by test_files/matlab_codes/excelTests/DeletedStrainTest.m

etc

kdahlquist commented 6 years ago

@jtorre39 can you point me to where the readme is so that I can take a look at it one last time? Thanks.

im-deepfriedwater commented 6 years ago

@kdahlquist Yes, it is located at GRNmap/test_files/matlab_codes/sampleTests/readme.txt

dondi commented 6 years ago

Test files audit now moves on to verifying that the unculled files comply with the latest input and output sheet formats (https://github.com/kdahlquist/GRNmap/wiki/How-to-format-the-input-file-for-GRNmap-v1.4-and-above, https://github.com/kdahlquist/GRNmap/wiki/How-to-interpret-the-output-file-for-GRNmap).

Brandon has already audited the 16-tests input sheets to that end; just the output sheets remain for those. @kdahlquist will post this information later.

kdahlquist commented 6 years ago

I'm pasting the text of the readme file GRNmap/test_files/matlab_codes/sampleTests/readme.txt here:

The scenario is a four gene network. Two genes are purely self regulated; two others feedback to each other. The data was created by executing two forward simulations, one with all four and one with a single gene deleted. This pair gives us the two data sheets (wt and dcin5).

Input_4_gene_inverse.xls is the input sheet for the estimation run. Input_4_gene_inverse_estimation_output_archived.xls is the output sheet. Name has been changed so that when you run the code, you won't overwrite this file. Input_4_gene_inverse_estimation_output_archived.mat is the output matlab binary file. Name has been changed so that when you run the code, you won't overwrite this file. figure_1, figure_2, and figure_3 are saved output. once again I have added the phrase _archived to the name. figure_4 is generated in the output, but there is no saved .jpg file (bug?).

also included is the file Input_4_gene_testing.xlsx, which was used to generate the forward (model-generated) testing data. In this file, you can see "the right answers" to the estimation problem, in the production_rates and network_weights sheets. fix_b is set to one, so the b's are not estimated in this example.

I still need to look at this and figure out the place where this information needs to reside. In the meantime, since the text is here, you can get rid of the file itself.

im-deepfriedwater commented 6 years ago

Got it!

im-deepfriedwater commented 6 years ago

Directories left for validating input sheet formats:

im-deepfriedwater commented 6 years ago

Many input sheets have been having an extra row or two. These rows were called sheet and deletion in the optimization_parameters sheet. They were not mentioned in the input sheet format guide for 1.4 and above. Additionally, they do not exist in the sixteen_tests sheets so they have been removed from the input sheets I've gone over.

im-deepfriedwater commented 6 years ago

Is there a reason why certain data cells are highlighted within GRNmap\test_files\MSE_tests\dHAP4_15_gene_network_deletion_added_input_KD_20160126.xlsx?

im-deepfriedwater commented 6 years ago

Within optimization_diagnostic_test/optimization_diagnostic_under_100_iterations_test.xlsx in the network_weights sheet the data cells are formatted to numbers instead of general. Should they be changed to general?

im-deepfriedwater commented 6 years ago

It might be worth to note in the wiki if data cells should be formatted to general or numbers for additional clarification.

dondi commented 6 years ago
dondi commented 6 years ago

On the issue of highlighted cells, these are remnants of pre-missing-data workarounds where missing cells were populated with average values then highlighted so that users would know which cells were missing.

Nothing needs to be done with the files themselves; however a note in the new README explaining this will help retain this information for the future.

im-deepfriedwater commented 6 years ago

Took some time but I checked each of the input sheets! Anything with an extra row has been corrected. I didn't catch any other oddity than the ones mentioned above.

kdahlquist commented 6 years ago

That is great! Thanks for doing this!

kdahlquist commented 6 years ago

I closed a duplicate issue #177. In that issue, @juansc and @trixr4kdz created a wiki page with documentation of the unit tests. We should revisit this wiki and update.

We should look into writing a script that would automatically generate testing documentation as has been discussed for the GRNsight team.

dondi commented 6 years ago

This issue can finally be closed. Will write the testing documentation task(s) up as a new issue.