Benchmarking Lumol - Githubissues

Luthaf commented 7 years ago

We need two kind of benchmarks:

regression benchmarks check the speed of a given algorithm;
comparison benchmark check the speed of a given simulation for comparison with other codes.

Benchmarks live in benches, and can be run using a nighlty compiler with cargo bench.

We currently have one regression benchmark for Ewald summation here; an,d one comparison benchmark against LAMMPS for the md simulation of Helium.

Here are some additional ideas:

Regression benchmarks

[x] Energy computation for a Lennard-Jones fluid
[x] Energy computation for a molecular fluid without charges (butane?)
[x] Energy computation for a charged system
[x] Energy computation for a molecular charged system

We already have energy computation for a molecular fluid with charges (water)

Simulation benchmarks

It would be nice to have all the combinations of MD/MC -- Lennard-Jones/butane/water -- NVE/NVT/NPT/μVT. Here is a full list:

[ ] MD -- Lennard-Jones -- NVE
[ ] MD -- Lennard-Jones -- NVT
[ ] MD -- Lennard-Jones -- NPT
[ ] MD -- butane -- NVE
[ ] MD -- butane -- NVT
[ ] MD -- butane -- NPT
[ ] MD -- water -- NVE
[ ] MD -- water -- NVT
[ ] MD -- water -- NPT
[ ] MC -- Lennard-Jones -- NVT
[ ] MC -- Lennard-Jones -- NPT
[ ] MC -- Lennard-Jones -- μVT
[ ] MC -- butane -- NVT
[ ] MC -- butane -- NPT
[ ] MC -- butane -- μVT
[ ] MC -- water -- NVT
[ ] MC -- water -- NPT
[ ] MC -- water -- μVT

That is already 18 different simulations, that we should compare against already existing MD and MC codes.

Maybe we can also have tests for bio-molecules, like a small peptide, DNA strand and a bigger protein.

Please comment with more ideas, and open PR to add the benchmarks!

g-bauer commented 7 years ago

For MC, to go to non-rigid molecules or molecules with more than 3 beads, we need a way to sample intramolecular configurations. See #35.

odarbelaeze commented 7 years ago

I noticed a few long running tests this morning on the tests folder of the project, I think that would discourage a nice test-on-save policy. It would be beneficial to reevaluate those and see if smaller sizes or move counts would do for those integration tests, and move the ones that require longer periods of time into the benches folder. (I apologise if I'm a little bit off topic here)

Luthaf commented 7 years ago

I'd prefer to keep these tests as test, but we can discuss it.

I usually only run the unit tests when developing (cd src/core && cargo test), and only run the integration tests before pushing. I also run the integration tests (the ones in the tests folder) in release mode cargo --test release which cuts a lot of run time here.

My setup is a bit clumsy at the moment, but it will become easier when cargo gets a --all flag (there is a PR for that in cargo repo). At this point, running unit tests will be as simple as cargo test --all --lib from the root directory, and running integration tests will still be cargo test --release.

g-bauer commented 7 years ago

You can assign me to the Lennard-Jones cases for NVT and NPT. Before that, I will implement energy cashing for resize_cell_cost. Maybe I manage to add that to the open PR #58 in time.

A practical question: in #58, I added an example case. Somehow, I couldn't manage to make this a #[test]. I compiles, but is not able to find the configuration file. I placed the config in lumol/tests/data/ and the test in lumol/tests. Using cargo test --test my_test --release fails.

I get the same error for existing tests, like mc-helium.rs:

$ cargo test --test mc-helium
# compiling ...
Finished debug [unoptimized + debuginfo] target(s) in 73.96 secs
Running /usr/.../lumol/target/debug/mc_helium-2217974ed8ac85bc

failures:
---- perfect_gaz stdout ----
        thread 'perfect_gaz' panicked at 'called `Result::unwrap()` on an `Err` value: TrajectoryError(Error { kind: NullPtr, message: "Could not open the file data/helium.xyz" })', ../src/libcore/result.rs:799

Any idea what is going wrong or what I am doing wrong here? Do I have to copy the data/ to the target directory?

Luthaf commented 7 years ago

Any idea what is going wrong or what I am doing wrong here?

Lets keep this issue on topic, I'll answer you in #58.

Luthaf commented 7 years ago

We can use bencher to run the benchmarks using a stable rust compiler, instead of a nightly one.

antoinewdg commented 7 years ago

This is a must have if we are to do any kind of optimization on Lumol. I would really like to help on this one, but have no real idea what a reasonable configuration for molecules looks like :( .

g-bauer commented 7 years ago

I'd be happy to help you with that. I can create configuration files for all systems mentioned in the OP. To compare Lumol to other codes, we have to make sure that the same force fields are used.

For Lennard-Jones (Argon), we can already cover all cases but the grand-canonical ensemble. Maybe we should also include an atomic mixture (Na, Cl) as intermediate step towards water?

For water and butane it's more difficult. For SPC/E water we can do NVT and NPT comparisons for MC simulations right now. For butane, we'd need #35 for MC simulations, MD should work for force fields that use non-fixed bond lengths.

I'd say we start with comparison benchmarks of Argon.

g-bauer commented 7 years ago

I can also perform the simulations in Gromacs, Gromos (maybe DL_Poly) and Cassandra to compare performance.

antoinewdg commented 7 years ago

@g-bauer thanks ! I'm OK for starting with argon. How are we doing this ? You provide me with the input files and I try to wire it all together ?

g-bauer commented 7 years ago

How are we doing this ? You provide me with the input files and I try to wire it all together ?

For argon, we already have inputs in the examples/data folder. I'd use the example configuration argon.xyz which has 500 atoms. We also have an example for MD (argon.rs) and for NPT MC (mc_npt_argon.rs).

I think it is more convenient to just use input files instead of rust bins? We can set up an argon force field file (argon.toml) only specifying the potential and then several simulation inputs, like mc_nvt_argon.toml, md_nvt_argon.toml, ... that all make use of the force field and configuration file.

We have to use the same cutoff radius for all simulations (say rc = 2.7 * sigma = 9.1935 Angstrom). Also, we should use the same frequencies to write data to files.

Hopefully, I'll get #94 finished tonight, so that we can use it for the MC part.

Does that make sense? If anything is unclear, feel free to ping me (here or gitter).

Luthaf commented 7 years ago

only specifying the potential and then several simulation inputs,

This is the exact use case of this feature 😃 !

Also, we should use the same frequencies to write data to files.

I think it is better not to write anything during the benchmark run. We are not benchmarking how the filesystem behave, and it can have a lot of latency and variations.

antoinewdg commented 7 years ago

File organisation is wonky for the benchmarks right now. We need a clear separation between regression benchmarks and comparison benchmarks (right now comparisons are in a other subdirectory inside the regression benchmarks). I agree with @g-bauer that comparison benchmarks should use input files, in my opinion they should use the lumol binary and could even live in a separate repository (they don't have to).

The current way regression benchmarks are done using cargo bench in the benches directory is fine with me. How i would ideally see the file structure for the comparison benchmarks is :

each folder corresponds to a case listed in the OP (ex: mc-butane-nve)
each folder contains one lumol.toml and optionally one input file for each ohter engine (LAMMPS.in, ...) that each corresponds to the same simulation
at the root we have a script (please no bash script) that iterates through all the folders and runs the computations with the different engines
I could even reuse this to profile the code, in dreamland I would even have this run regularly and uploading the results on a website without me having to do anything

Luthaf commented 7 years ago

I am OK with this organisation. The current code is pretty old and comes from the very first times of this repository.

please no bash script

What would you use ? I'm all in for Python, or even a rust "script".

I could even reuse this to profile the code, in dreamland I would even have this run regularly and uploading the results on a website without me having to do anything

This is #49. The main documentation is hosted in github pages right now, we could use it for benchmarks too. I was thinking we could write benchmark results to a JSON file, and then load them and plot them using some JS plotting library.

antoinewdg commented 7 years ago

@Luthaf yes I'm 100 % in favor of a Python script (I just find bash scripts unreadable).

Plotting the benchmarks would super nice, but in case you guys don't know: the world of JS graph plotting is HELL.

g-bauer commented 7 years ago

Sounds good to me.

each folder contains one lumol.toml and optionally one input file for each ohter engine (LAMMPS.in, ...) that each corresponds to the same simulation

Other codes may need plenty of different files (special configuration format, multiple inputs for simulation setup and force fields). Just dropping the files inside a directory will be messy. Maybe another subfolder for every engine?

at the root we have a script (please no bash script) that iterates through all the folders and runs the computations with the different engines

You would also rerun the simulations using other engines? I imagine that is very tedious to set up. To start, could we go with a single run (maybe on different systems), store benchmarks and compare against those?

Plotting the benchmarks would super nice, ...

If we go with python, why not use matplotlib (or jupyter notebooks)? It is easy to use and set up.

antoinewdg commented 7 years ago

Other codes may need plenty of different files (special configuration format, multiple inputs for simulation setup and force fields). Just dropping the files inside a directory will be messy. Maybe another subfolder for every engine?

Can the other engines have a single file as input, and have links to other files inside this file ? (as we do with lumol), or do they need everything to be in the same directory ? In the second case, I agree with having a subdirectory for each engine. I don't mean that in the first case we do not have subdirectories, I just mean that we force on each case having the input file, and on a case per case basis we can choose the directory structure.

You would also rerun the simulations using other engines? I imagine that is very tedious to set up. To start, could we go with a single run (maybe on different systems), store benchmarks and compare against those?

What do you mean by tedious ? If you mean long, yeah probably, but that's not really an issue, we expect them to be super long anyways right ? And I don't really know how consistent is performance on Travis CI across builds, so I can see a lot of issues coming from not running them each time (admittedly, Travis CI performance may not be consistent in the same build, that would be an issue).

If we go with python, why not use matplotlib (or jupyter notebooks)? It is easy to use and set up.

How would you integrate them in a web page ?

g-bauer commented 7 years ago

Can the other engines have a single file as input, and have links to other files inside this file ?

As far as I know (at least for Gromacs and Cassandra) that is not possible.

What do you mean by tedious ?

I might not understand the whole procedure for the benchmarks but we'd need installations of all codes, right? They often depend on a bunch of additional libraries and are compiled to fit the architecture.

antoinewdg commented 7 years ago

I might not understand the whole procedure for the benchmarks but we'd need installations of all codes, right? They often depend on a bunch of additional libraries and are compiled to fit the architecture.

OK, but to benchmark we would have to install all of this on the same machine we want to benchmark Lumol on, so running it each time may not be much more painful than running it once. However installing the other engines on Travis may be a huge pain. Is there something I don't get ?

Luthaf commented 7 years ago

If we want to run the benchmarks using a stable compiler, I found two options:

bencher is the standard library benchmark code extracted and adapted;
criterion is a separated library with more setup/measurement options.

g-bauer commented 7 years ago

However installing the other engines on Travis may be a huge pain.

That's something I'm not experienced in (Travis). Are there limits of what we can run using Travis? Simulation times, resources, libraries?

Luthaf commented 7 years ago

Yes, there are limitations (Travis is intended as a testing service, not a benchmarking service):

Simulation times: I believe the job is killed after 1h
Resources: see here, we get 4GB of memory and 2 cores
Libraries: we can use most of the libraries available in Ubuntu, or build our own (that will take build time, but it can be cached)

I think that we should run specific benchmarks (energy computation for different system, and one complex simulation) on Travis, and run benchmarks comparing with other codes from time to time on our machines, and upload the results.

There might be other providers more suitable for benchmarks too.

antoinewdg commented 7 years ago

I didn't know about the 1 hour limit, this can be a problem. I tried to search for Travis like providers oriented towards performance testing, I didn't find much :(

antoinewdg commented 7 years ago

Should we run the current benchmarks on Travis ? From my point of view it would be extremely valuable to be able to run them on a remote machine for each commit: when I run them on my local machine I can basically do nothing else, so my productivity is kinda ruined.

lumol-org / lumol

Benchmarking Lumol #62

Regression benchmarks

Simulation benchmarks