openforcefield / openff-interchange

A project (and object) for storing, manipulating, and converting molecular mechanics data.
https://docs.openforcefield.org/projects/interchange
MIT License
71 stars 23 forks source link

[ENH] Improve logging / user experience for charge assignment #1048

Closed IAlibay closed 1 week ago

IAlibay commented 1 month ago

Context

Follow-up from https://github.com/openforcefield/standards/issues/68 and some DMs between @mattwthompson and I.

What currently happens

When creating an interchange object, charges will be assigned to Molecules based on the charge assignemnt hiearchy. However, it not possible to easily know which molecules were assigned charges through which method - something which users would like to know as it could affect the quality of their results.

The ask

The base request here would be add logger messages at the INFO level during Interchange.from_smirnoff creation which tells us what how partial charges were assigned for each unique molecule.

Alternative

For now I'm just checking what has library charges manually (outside for the interchange creation step), but that makes for a difficult developer experience.

mattwthompson commented 1 month ago

I think this is a great idea - precisely how each atom got its charges assigned is a common pain point for users, and one that's likely to grow if we release force fields that support other charge methods than the current Sage line (AM1-BCC everywhere, except simple library charges for water and monoatomic ions). I fully agree this is a subpar user experience

j-wags commented 1 month ago

Agree that there should be a way to see how charges were assigned. Since this seems like it'll be a load-bearing feature, do we want to go deeper than logging to INFO? The partial charge source could be written to the output Molecule/Topology/Interchange objects as a more general solution.

IAlibay commented 1 month ago

The partial charge source could be written to the output Molecule/Topology/Interchange objects as a more general solution.

If that's possible, from a developer standpoint, that would be great.

Logging would be good for your base users that probably have a script lying around to create gromacs/amber input files (edit: I'm not sure how well I'm explaining this, can expand if necessary).

mattwthompson commented 1 month ago

Jeff and I chatted about this briefly and decided to initially try the logging approach because it should be significantly more straightforward to implement but also provide downstream users with enough information to reason through how charges were assigned.

A separate solution might involve storing full provenance on the ElectrostaticsCollection itself. This would be significantly richer in data but more difficult to implement and potentially bloat the memory/disk size of the resulting models. If straightforward logging shapes up nicely, I don't think I'm going to further explore this idea. But it might be worth prototyping to see if the actual thing is different than I expect.

mattwthompson commented 1 week ago

Expect this in 0.4.1, unknown date

This is a sizable feature addition that intersects with some of the most complicated parts of our products, so, even though I wrote more tests than about any feature I've ever written, I expect some rough edges when this is rolled out. I'll happily take feedback on how this could be improved, everything from obvious bugs to ergonomics.

mattwthompson commented 1 week ago

https://github.com/openforcefield/openff-interchange/issues/1088 is the only major thing not well-supported right now. I think finding ways to consolidate logging of large molecules is another easy avenue for improvement, I just deferred it originally.