materialsproject / emmet

Be a master builder of databases of material properties. Avoid the Kragle.
https://materialsproject.github.io/emmet/
Other
49 stars 63 forks source link

A builder for classical md workflows #1010

Open orionarcher opened 1 month ago

orionarcher commented 1 month ago

This PR implements an ElectrolyteBuilder class for analysis of trajectories generated by atomate2.openmm.

The core ElectrolyteBuilder class is supported by a variety of utility functions that interact with MDAnalysis and SolvationAnalysis. In addition to the usual get_items, process_items, update_targets workflow, users can also manually create an MDAnalysis.Universe directly from the taskdoc with ElectrolyteBuilder.instantiate_universe. This makes manual analysis of a small number of systems much easier.

Everything is reasonably well tested and once CI passes this will be ready for review.

orionarcher commented 1 month ago

I'd need to add a few dependencies for this PR and would appreciate some guidance from @tschaume or @munrojm

Two dependencies are only available on conda: openff-toolkit and openff-interchange. I am currently installing these directly in the CI alongside openbabel, testing.yml:36. I'll add instructions in the tutorial to tell users to install these separately.

Two dependencies are on pip, solvation-analysis and MDAnalysis, but I am honestly not sure where they need to be added. I added them to emmet-core/setup.py but that doesn't seem to be used by CI. I can see there are several autogenerated requirements.txt files but I can't figure out where the feedstock for those files is. Where do I need to put pip dependencies so they make it into CI?

Thank you!

tschaume commented 1 month ago

@orionarcher I added the two dependencies to the setup.py in the main branch and re-run the action to upgrade all requirements files. They should install during testing when you merge main into your branch for this PR.

orionarcher commented 1 month ago

Thanks @tschaume.

It looks like the CI failed with the added dependencies, with the same error I had here. I suspect there is some hidden dependency clash between MDAnalysis and chgnet?

I could: 1) Try moving the new dependencies to be installed via conda in the CI. Might work but a bit hacky. 2) Add pytest.importorskip inside the new tests and remove the dependencies from CI. This would require users to install a few extra packages when running MD and run the tests locally. This isn't a mainstream code so that wouldn't be the end of the world.

tschaume commented 1 month ago

This seems to be an issue with the latest chgnet release, see https://github.com/CederGroupHub/chgnet/issues/160. I'm pinning it on main and will update the dependency files.

tschaume commented 4 weeks ago

@orionarcher the chgnet installations succeeds now (see here). The tests are now failing during coverage, though:

  File "/home/runner/.local/lib/python3.11/site-packages/coverage/data.py", line 25, in <module>
    from coverage.sqldata import CoverageData
  File "/home/runner/.local/lib/python3.11/site-packages/coverage/sqldata.py", line 16, in <module>
    import sqlite3
  File "/usr/share/miniconda/envs/test/lib/python3.11/sqlite3/__init__.py", line 57, in <module>
    from sqlite3.dbapi2 import *
  File "/usr/share/miniconda/envs/test/lib/python3.11/sqlite3/dbapi2.py", line 27, in <module>
    from _sqlite3 import *
ImportError: /usr/share/miniconda/envs/test/lib/python3.11/lib-dynload/_sqlite3.cpython-311-x86_64-linux-gnu.so: undefined symbol: sqlite3_deserialize

I'll see if I can resolve this and keep you posted. FYI @munrojm

tschaume commented 4 weeks ago

@orionarcher @munrojm I think I was able to resolve the issue. The tests should pass after merging in the main branch. 🤞

codecov-commenter commented 4 weeks ago

Codecov Report

Attention: Patch coverage is 93.20000% with 17 lines in your changes missing coverage. Please review.

Project coverage is 90.12%. Comparing base (703cf7c) to head (f7a653b).

Files Patch % Lines
...mmet-builders/emmet/builders/classical_md/utils.py 90.47% 6 Missing :warning:
emmet-core/emmet/core/classical_md/solvation.py 91.22% 5 Missing :warning:
...uilders/emmet/builders/classical_md/openmm/core.py 96.07% 4 Missing :warning:
emmet-core/emmet/core/classical_md/openmm/tasks.py 71.42% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1010 +/- ## ========================================== + Coverage 90.02% 90.12% +0.09% ========================================== Files 142 147 +5 Lines 13390 13634 +244 ========================================== + Hits 12055 12287 +232 - Misses 1335 1347 +12 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

orionarcher commented 4 weeks ago

Thank you @tschaume!

I'm afraid I wasn't clear about the dependencies.

It looks like MDAnalysis and solvation-analysis are only in the emmet-core requirements and emmet-core CI, but they are also required in emmet-builders. Is this something I can fix by updating the testing.yml or will an action need to be triggered to rewrite the emmet-builders requirements?

tschaume commented 3 weeks ago

@orionarcher no problem, I just added the same dependencies to emmet-builders and ran the action to update its dependencies in the main branch.

tsmathis commented 3 weeks ago

Do these packages need to be in the base install for emmet-core and emmet-builders? I just recently moved the ml requirements out of the default install for emmet-builders to try to keep dependency bloat down for the core functionality of emmet.

Is this going to end up being a core builder/set of builders that we run frequently?

Also, I'm not familiar with MDAnalysis are there going to be any additional binaries that will be needed to compile these dependencies correctly? (@tschaume, e.g. the library dependencies in our docker images)

tschaume commented 3 weeks ago

Yes, I agree that these dependencies should be separated into their own extra. In emmet-core they already should be, feel free to do the same for emmet-builders. I don't think @orionarcher had any objections!?

And yes, we might have to add openff (I think) in our docker images if this builder is going to run as part of our cloud pipeline. For now, I wanted to make sure @orionarcher can proceed with testing.

tsmathis commented 3 weeks ago

Gotcha, sounds good!

orionarcher commented 3 weeks ago

No objections here!

MDAnalysis should be pip installable with no additional binaries. As Patrick said openff.tookit and openff.interchange will need to be installed separately, but that can be done from conda with no manual binary compilation.

orionarcher commented 3 weeks ago

Finally!

I have all the currently desired functionality here so I'd love to merge this when the dependency issues are figured out.

tschaume commented 3 weeks ago

@tsmathis @munrojm I think, in its current form, the two additional dependencies could be removed from emmet-builders/setup.py since they're already included in emmet-core[all] in install_requires. I'll leave it up to you how to best separate the dependencies into extras if needed. I don't think it's a blocking issue for merging this PR though.

orionarcher commented 2 weeks ago

Good catch @tschaume. I removed the redundant dependencies from emmet-builders.