rcsb / symmetry

:ferris_wheel: Detect, analyze, and visualize protein symmetry
GNU Lesser General Public License v2.1
26 stars 16 forks source link

Running tests modifies resources #39

Closed sbliven closed 8 years ago

sbliven commented 10 years ago

Running the tests modifies xml files in src/test/resources/jCE*. This is annoying for version control, but more importantly it could introduce dependencies between sequential test runs.

The files are being generated by ProtodomainTest. ResourceList seems to be providing a caching layer (generally a bad idea for unit tests), and uses src/test/resources/<algorithm name> as the default output directory.

@dmyersturnbull, can you clarify the intent behind ResourceList? Is this needed for anything besides the diff methods?

At a minimum, the DEFAULT_DIR needs to be set to a temp directory and all the alignments calculated at each test run.

dmyersturnbull commented 10 years ago

That class is only used for diffs. It was meant as a utility to avoid having to create a huge number of "expected" alignments by hand. My procedure was generally:

  1. Write tests for something.
  2. Run the tests once, and carefully validate the residue numbers in the generated XML files.
  3. Commit the new tests with their associated resources (the alignment XML files).

If it's used this way, it shouldn't introduce dependencies between sequential test runs, and it shouldn't present any problems for version control. I also used that class to generate regression tests based on mistakes in the census (it looks like I deleted that class since it needed so many resource files).

With that said, this isn't documented and is dangerous, so I like your idea of saving to a tmp directory. However, I notice you deleted all the alignments from the src/test/resources/ directory. I don't think we want to do this, because they should be be preserved between tests. The files that have already been committed aren't for caching; they're all manually verified "expected" alignments that are used by tests.

This also means I don't think we should change DEFAULT_DIR to a tmp directory. Instead ResourceList should:

  1. Load from src/test/resources/, but
  2. Not use the method put() when loading, but make it public for convenience, or
  3. Put() resources to a TMP directory when they don't exist in src/test/resources.
sbliven commented 10 years ago

So if I understand correctly, you intended a two pass strategy:

  1. The first test run generates the files in src/test/resources and always passes
  2. Validate the results & check them in
  3. Subsequent runs write files to DEFAULT_DIR (currently either java's temp directory or the PDB_DIR env variable). If they differ from the checked in version, tests fail.

However, I don't think it is actually happening this way, because when I run ProtodomainTest it not only creates two new xml files (which presumably should have been checked in), but it also modifies three more files and still passes!

Actually, now that I understand what you intended I think it's a great idea. Rosetta uses a similar strategy for their integration tests, but without checking anything in to cvs (they had gigs of output): you first run on a clean build to generate an 'expected' directory, then subsequent runs create a fresh 'results' dir which gets diffed against the expected. Of course, this requires the build system keep expected results from the previous build, but I think that we have small enough output we can just store it in git.

I do think we should refactor the code to better differentiate the junit tests from the ResourceList tests. Perhaps different packages (eg org.biojava.outputtest), and clear separation of resources.

sbliven commented 10 years ago

FYI, I'm working on refactoring the test cases in my testframework branch.

Biojava's integration tests use the rather strange package org.biojava3.structure.test. What about integration tests from other modules? I want to be consistent across both projects, but wouldn't org.biojava3.test.<module>... make more sense?

andreasprlic commented 10 years ago

I had to start using the weird package name due to issues with code signing. Can't have the same package name in multiple jars...

sbliven commented 10 years ago

@andreasprlic I get that, but wouldn't test.structure make more sense than structure.test, in case any of the other modules want to add integration tests in the future?

andreasprlic commented 10 years ago

agreed. This can be renamed easily since there should be no external dependencies on testing code.

lafita commented 8 years ago

The current tests do not use resources, so I am closing this issue.