stfc / janus-core

Tools for machine learnt interatomic potentials
https://stfc.github.io/janus-core/
BSD 3-Clause "New" or "Revised" License
10 stars 7 forks source link

Investigate slow tests #234

Closed ElliottKasoar closed 1 week ago

ElliottKasoar commented 1 month ago

A few tests, e.g. descriptors for a trajectory, are very slow, so we should check if they can be simplified, while retaining the same utility.

ElliottKasoar commented 1 month ago

For reference, one example on my laptop:

226.07s call     tests/test_descriptors_cli.py::test_traj
66.50s call     tests/test_phonons_cli.py::test_plot
63.05s call     tests/test_train_cli.py::test_fine_tune
22.88s call     tests/test_eos_cli.py::test_minimising_all
20.36s call     tests/test_correlator.py::test_md_correlations
12.99s call     tests/test_md_cli.py::test_seed
12.53s call     tests/test_phonons_cli.py::test_bands
10.89s call     tests/test_geom_opt.py::test_str_optimizer
10.45s call     tests/test_eos_cli.py::test_eos
9.44s call     tests/test_eos.py::test_calc_eos
9.29s call     tests/test_eos_cli.py::test_writing_structs
8.80s call     tests/test_train_cli.py::test_train
8.75s call     tests/test_phonons_cli.py::test_bands_simple
8.57s call     tests/test_md.py::test_minimize_every
8.38s call     tests/test_eos_cli.py::test_setting_lattice

So out of 950s, around half is due to the slowest five tests, with almost a a quarter from tests/test_descriptors_cli.py::test_traj alone

ElliottKasoar commented 1 month ago

(tests/test_descriptors_cli.py::test_traj appears to have been resolved by #244)

alinelena commented 1 month ago

ok for descriptors the summary seems to misbehave when more than one frame is present

head -n 20 descriptors_summary.yml 
command: janus descriptors
inputs:
  calc:
    arch: mace_mp
    calc_kwargs:
      model: ../models/ft_medium.5.model
    device: cuda
    model_path: null
    read_kwargs: {}
  calc_per_element: false
  invariants_only: true
  log: descriptors.log
  struct:
  - !!python/object:ase.atoms.Atoms
    _calc: &id001 !!python/object:mace.calculators.mace.MACECalculator
      _directory: .
      atoms: null
      charges_key: Qs
      device: !!python/object/apply:torch.device
      - cuda
alinelena commented 1 month ago

ok issue is caused by this https://github.com/stfc/janus-core/blob/b45d5fdcea3fe5f1df176d0ae2f1621dd9be8027/janus_core/cli/descriptors.py#L144

I suspect is a left over from refactoring... since single point seems to use a different strategy... using the same here... brings back the speed

ElliottKasoar commented 1 month ago

ok for descriptors the summary seems to misbehave when more than one frame is present

head -n 20 descriptors_summary.yml 
command: janus descriptors
inputs:
  calc:
    arch: mace_mp
    calc_kwargs:
      model: ../models/ft_medium.5.model
    device: cuda
    model_path: null
    read_kwargs: {}
  calc_per_element: false
  invariants_only: true
  log: descriptors.log
  struct:
  - !!python/object:ase.atoms.Atoms
    _calc: &id001 !!python/object:mace.calculators.mace.MACECalculator
      _directory: .
      atoms: null
      charges_key: Qs
      device: !!python/object/apply:torch.device
      - cuda

That's what #244 should have fixed. Has it not?

So the incorrect struct is added to inputs in the line you linked, but thensave_struct_calc should clean it up, so we don't have to do it for each calc individually

alinelena commented 1 month ago

is this still active?

ElliottKasoar commented 1 month ago

tests/test_descriptors_cli.py::test_traj is fixed, which was the biggest concern, and an actual bug. The slowest tests for me currently are:

53.40s call     tests/test_phonons_cli.py::test_plot
45.64s call     tests/test_train_cli.py::test_fine_tune
16.52s call     tests/test_eos_cli.py::test_minimising_all
9.80s call     tests/test_md_cli.py::test_seed
9.22s call     tests/test_phonons_cli.py::test_phonons

I'm still hoping to reduce the time of the slowest e.g. tests/test_phonons_cli.py::test_plot and tests/test_train_cli.py::test_fine_tune a little if possible at some point, but they're a lower priority, as I'm not sure how much they can be sped up without losing functionality.