nicocvn / emDNA

Energy minimization software for DNA/proteins complexes by the Olson lab at Rutgers
https://nicocvn.github.io/emDNA/
5 stars 1 forks source link

Project refactoring questions #5

Open nicocvn opened 3 years ago

nicocvn commented 3 years ago

@rty10 So as I moving on with DNASim refactoring I noticed a few things:

  1. There is a sub-dir called dna_force_field_packager in the DNASim source tree; as far as I remember (and can tell) this is a standalone command line tool to create force-field data files to be used in emDNA. The way I see it is that this better belong next to emDNA rather than in the DNASim source tree (which is supposed to be a library only). Would it be fine to relocate that thing? A broader question is how do you guys deal with force field when using emDNA (in terms of workflow)? Do you only use the one shipped with the tool or you use custom ones?

  2. My intention is that once the project is refactored the only build products that get installed are emDNA commnad line tools. That would mean that the DNASim library itself does not get installed. I am fine with installing the test executables as this is convenient to check the build was proper.

  3. There is quite a bit of cruft around single/double precision. I would suggest dropping single precision entirely because: a) I am not sure this was ever fully tested (float suffer from quite limited accuracy) b) realistically I would expect anyone using this code to have access to a machine that supports double precision natively. So the question is: can we drop support for single precision?

  4. Do we have a set of emDNA tests? and by that I mean a set of inputs and outputs computed with the "original" emDNA? This would be very helpful to make sure nothing break and that we are able to reproduce previous results.

rty10 commented 3 years ago

@nicocvn Responses:

  1. I agree that the ffieldpackager should be in emDNA and not DNASim, so the move works for me. We use both pre-loaded and custom, but the only pre-loaded ffields we use regularly is IdealDNA and, after an update on the values, Olson1998. Wilma mentioned issues with the Anisotropic-named ffields in the past and I have not bothered to explore those issues.
  2. I'm confused with the intent on this. Are you saying we remove DNASim from the overall build or just during current testing? If the compilation requires DNASim but it's no longer needed for active emDNA use, I say remove it.
  3. I think dropping single precision is fine so long is there is a mention in the build notes just in case.
  4. I think build tests with emDNA would be great but I'm unclear how this will work. Do you have anything in mind? Do you want the tests to be in a separate directory that can be independently downloaded?
nicocvn commented 3 years ago
  1. I agree that the ffieldpackager should be in emDNA and not DNASim, so the move works for me. We use both pre-loaded and custom, but the only pre-loaded ffields we use regularly is IdealDNA and, after an update on the values, Olson1998. Wilma mentioned issues with the Anisotropic-named ffields in the past and I have not bothered to explore those issues.

Ok; I will take a look at the state of the tooling around force fields just to make sure there is no outstanding issues.

  1. I'm confused with the intent on this. Are you saying we remove DNASim from the overall build or just during current testing? If the compilation requires DNASim but it's no longer needed for active emDNA use, I say remove it.

Sorry for being unclear. What I meant is that DNASim will still be built but not installed; see the installation notes: there are two steps: one is building the whole project and another one is installing which basically copy what was built. My point is that this project is not intended to provide DNASim as a user-side build product but just the emDNA executables.

  1. I think dropping single precision is fine so long is there is a mention in the build notes just in case.

Ok I will remove single precision support as part of #3

  1. I think build tests with emDNA would be great but I'm unclear how this will work. Do you have anything in mind? Do you want the tests to be in a separate directory that can be independently downloaded?

This question was about internal tests for ourselves (we will add examples for user in time). The goal here is to make sure that whatever we change does not break the code. For example, I updated the Eigen version used by the project and although unlikely it is possible this introduced some change in how emDNA behaves. So what I would like to have is a few inputs files along with their outputs as computed by emDNA before we changed anything. Then I can run the new emDNA and these inputs and compare results to make sure it still behaves (numerically speaking as expected).

rty10 commented 3 years ago
  1. There will be a point in the future that I will be expanding to tetrameric forcefields and will need to be addressed.
  2. I talked with Stefi and looked at #3 to understand exactly what you were saying. I have not used anything in the DNASim once the compilation is complete so if moving to a static version will make things lighter, I say go for it.
  3. Noted
  4. I've never built code on this scale so I'm confused about when and how to apply these tests. I understand we want to have a set of standard inputs and known outputs as to ensure the code didn't break. My question is how do we wish to store these and give ourselves and future users access? Do we want to make this a script that does 5 standard tests and checks expected outputs or do we want users to run the tests we provide separately?
nicocvn commented 3 years ago
  1. I've never built code on this scale so I'm confused about when and how to apply these tests. I understand we want to have a set of standard inputs and known outputs as to ensure the code didn't break. My question is how do we wish to store these and give ourselves and future users access? Do we want to make this a script that does 5 standard tests and checks expected outputs or do we want users to run the tests we provide separately?

For now just having the input/ouput files in the repo for ourselves would be sufficient. But in the medium to long term yes we can devise some scripting to automate the computation and check that expected results are obtained. Once the project is made public we can use Github Actions which makes it possible to implement CI (continuous integration) so that this kind of tests along with the unit tests present in the repo can be run automatically when the code is updated. This is a very neat way to make sure no regressions or bugs are introduced.

stodolli commented 3 years ago
2. My intention is that once the project is refactored the only build products that get installed are emDNA commnad line tools. That would mean that the DNASim library itself does not get installed. I am fine with installing the test executables as this is convenient to check the build was proper.

I agree with this. Furthermore, I can imagine down the line we could also offer a pre-built standalone executable for those that do not wish to compile their own code.

4. Do we have a set of emDNA tests? and by that I mean a set of inputs and outputs computed with the "original" emDNA? This would be very helpful to make sure nothing break and that we are able to reproduce previous results.

I agree that we need to come up with some tests for emDNA functionality. Granted, I do not use emDNA much and I am on my way out the lab, but I can help @rty10 set up and implement some test cases.

nicocvn commented 3 years ago
  1. My intention is that once the project is refactored the only build products that get installed are emDNA commnad line tools. That would mean that the DNASim library itself does not get installed. I am fine with installing the test executables as this is convenient to check the build was proper.

I agree with this. Furthermore, I can imagine down the line we could also offer a pre-built standalone executable for those that do not wish to compile their own code.

Yes! Especially on the project is public we can automate that via GH actions.

I agree that we need to come up with some tests for emDNA functionality. Granted, I do not use emDNA much and I am on my way out the lab, but I can help @rty10 set up and implement some test cases.

That would be very helpful because the coverage of the unit tests in DNASim is pretty good but for emDNA I believe it is pretty minimal. I will keep working on addressing #3 and making sure everything compiles properly and pass the current tests.