materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.52k stars 864 forks source link

Slow tests on pymatgen #1553

Closed shyuep closed 1 year ago

shyuep commented 5 years ago

There are some tests which are ridiculously slow on pymatgen. Based on pytest, these are the slowest tests. Most are for the defect package, which I have highlighted quite a few times already. But the fragmenter test also takes a full 4 mins to run (@samblau can you speed this up?).

244.87s call     pymatgen/analysis/tests/test_fragmenter.py::TestFragmentMolecule::test_babel_PC_with_RO_depth_0_vs_depth_10
93.16s call     pymatgen/analysis/defects/tests/test_thermo.py::DefectsThermodynamicsTest::test_suggest_charges
88.57s call     pymatgen/analysis/defects/tests/test_corrections.py::DefectsCorrectionsTest::test_kumagai
68.99s call     pymatgen/analysis/defects/tests/test_thermo.py::DefectsThermodynamicsTest::test_suggest_larger_supercells
67.63s setup    pymatgen/analysis/tests/test_interface.py::InterfaceBuilderTest::test_film_and_substrate_sites
64.83s call     pymatgen/analysis/defects/tests/test_thermo.py::DefectsThermodynamicsTest::test_solve_for_non_equilibrium_fermi_energy
61.05s call     pymatgen/analysis/defects/tests/test_defect_compatibility.py::DefectCompatibilityTest::test_check_kumagai_delocalized
60.37s call     pymatgen/analysis/defects/tests/test_thermo.py::DefectsThermodynamicsTest::test_solve_for_fermi_energy
55.63s call     pymatgen/analysis/defects/tests/test_thermo.py::DefectsThermodynamicsTest::test_plot
54.23s call     pymatgen/analysis/defects/tests/test_thermo.py::DefectsThermodynamicsTest::test_good_test_data
53.58s call     pymatgen/analysis/defects/tests/test_generators.py::InterstitialGeneratorTest::test_int_gen
53.43s call     pymatgen/analysis/defects/tests/test_thermo.py::DefectsThermodynamicsTest::test_get_dopability_limits
53.43s call     pymatgen/analysis/defects/tests/test_thermo.py::DefectsThermodynamicsTest::test_entries
45.04s call     pymatgen/analysis/tests/test_structure_analyzer.py::MiscFunctionTest::test_average_coordination_number
36.73s call     pymatgen/analysis/elasticity/tests/test_elastic.py::DiffFitTest::test_fit
30.93s call     pymatgen/command_line/tests/test_bader_caller.py::BaderAnalysisTest::test_automatic_runner
samblau commented 5 years ago

I'm actually long overdue on a general pymatgen PR, and one of the things I've done is substantially overhaul the fragmenter as well as made some key additions to graphs, both of which result in substantial speedups. That test now takes 43 seconds on my laptop. I can work on getting the PR together on Monday.

shyamd commented 5 years ago

The defects stuff is never going to be fast. If it's causing a significant failure of tests, then we need to put it on a conditional skip to run every once in a while.

shyuep commented 5 years ago

Never is a word used too often. There was once a time when people said distance computations were never going to be fast.

The idea is to design the tests to ensure correct functionality using small model systems.

shyamd commented 5 years ago

OK, I will not be spending time to make them faster since they already work fast enough for defect calculations. The defect thermodynamics are already constructed using QHull, so they're unlikely to get much faster. The defect corrections will take a decent bit of understanding and effort to convert from python to Cython. The interstitial generator algorithms are just hard to develop. Danny and I spent a lot of time getting this all to something reasonable.

shyuep commented 5 years ago

@shyamd It is the responsibility of all coders to write reasonable tests. No one is asking you to rewrite the code in cython. I am asking for the tests to be sped up, not the code.

Just for reference, I merely changed one thing from setup to setupClass in one of the defect tests, and that essentially took a full 4mins or so off the tests because the same defect generator is used for all tests - the tests were not changing the defect class at all.

These are the kind of common sense things that should be done. Rather than just fighting me on this trivial stuff.

If you want me to disable the defect tests or to junk the entire defect code, that is fine by me.

mkhorton commented 5 years ago

I'm pretty sure the interstitial generator algorithms specifically could be significantly sped up (by an order of magnitude or more). They are excessively slow and missing some obvious optimizations. But I echo @shyamd; there's nobody currently working on this who can take the lead.

shyuep commented 5 years ago

@mkhorton @shyamd If no one is working on the defects, I will begin to deprecate the entire functionality. I will remove the package in Oct 2019.

mkhorton commented 5 years ago

I don't think that's necessary, the defects package is broadly excellent and a result of a substantial body of work, it just doesn't have a primary maintainer to add new features or performance optimizations right now. We can commit to address any bugs as necessary however.

shyuep commented 5 years ago

@mkhorton As I said, I don't expect major performance optimizations - I am expecting working and reasonable tests that does not affect the overall ability to iterate on the rest of pmg code. Right now, that is not the case. I would rather sacrifice the defects package than sacrifice our ability to iterate quickly.

So someone - anyone - has to take the lead to make it happen.

mkhorton commented 5 years ago

We already have EXCLUDE_TESTS so that we can skip long-running tests in Travis that would otherwise slow down development. This is not ideal, but it seems like a sensible, pragmatic solution until the tests can be improved rather than removing an entire submodule which is deeply integrated into our workflows and has been peer-reviewed, referenced in papers, etc.

shyuep commented 5 years ago

There is a contradiction here. If it is "deeply integrated" into MP, then I would assume there is at least one person who is actively using and working on it, and that person can fix this. If there is "no one who has time to fix this", then it is essentially static code that no one is really caring about. Seriously, the time we spent debating on this, someone could have already done a 80% fix to satisfy what I asked for.

shyamd commented 5 years ago

I am using the defects package regularly and am responsible for both this and interfaces. I'm currently focused on MP infrastructure since this is on-fire. Please just skip these tests for now, and I'll come back and fix them.