materialsproject / emmet

Be a master builder of databases of material properties. Avoid the Kragle.
https://materialsproject.github.io/emmet/
Other
49 stars 63 forks source link

Revive `ValueEnum.as_dict` method #974

Closed esoteric-ephemera closed 3 months ago

esoteric-ephemera commented 3 months ago

A bug was reported in atomate2 where during a call to monty.json.jsanitize (specifically with strict=True and enum_values = False), an error was thrown during deserialization of VaspObject because it lacks an as_dict method. I reproduced this with versions of emmet-core > 0.77.1

Since VaspObject inherits from emmet.core.utils.ValueEnum, and its as_dict method was removed in PR #944, this is the source of the error

TL;DR: This PR adds back the kludgey as_dict -> str method of ValueEnum. Explainer below, I think these points merit discussion:

Discussion / resolution (???):

codecov-commenter commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 90.02%. Comparing base (5cc66a2) to head (17d7a86). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #974 +/- ## ======================================= Coverage 90.01% 90.02% ======================================= Files 138 138 Lines 13226 13228 +2 ======================================= + Hits 11906 11908 +2 Misses 1320 1320 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Andrew-S-Rosen commented 3 months ago

This is all fair.

I think it is worth us strongly considering adding a deprecation warning on this class, however, and committing to getting rid of it. We can be generous with the deprecation warning (e.g. 6 months). I'll still stand by my original point that the downstream codes relying on this hack should handle things in a better way. Namely, that means switching to Enum as the base class where appropriate and having jobflow call jsanitize with enums=True if the goal is to serialize enums to str (or, if this is not desirable, have atomate2 be able to pass a customize set of jsanitize kwargs to Jobflow somehow).

Reviving the .as_dict() method makes sense for now to prevent breaking downstream packages, but it also causes significant problems when objects containing a ValueEnum are used with various monty utilities. It is worth us doing things the right way in the long term. As an ecosystem, we should strive to avoid hacking .as_dict() methods with non-dictionary outputs, and I think that's a point we can all get behind.

Also tagging @jmmshn and @munrojm on this discussion.

utf commented 3 months ago

Hi @Andrew-S-Rosen, thanks for your thoughts. I still have concerns about deprecating this class entirely. There is a desire to support both enum value and enum dict serialisation at the same time in atomate2/jobflow. I believe that means having enum_values=False and some other Enum-like object for achieving this.

A couple of thoughts going forward:

Either way, I think merging this for now would be helpful for atomate2 and we can explore alternatives in the meantime.

utf commented 3 months ago

Thanks @munrojm !

Andrew-S-Rosen commented 3 months ago

@utf: Thanks for the follow-up!

Is your main objection at the moment the confusing naming of the as_dict method, rather than the fact that these objects aren't serialisable by themselves? A potential solution could be to create a new object that behaves as ValueEnum but which doesn't have this method and therefore which won't confuse users.

The misleading name for the method is certainly one reason, although not the major one. The major issue is that, by adding in this non-dictionary .as_dict() method, objects (e.g. dictionaries, Pydantic models) containing a ValueEnum will not play nicely with any monty routine that internally expects .as_dict() to return a dictionary. Of course, I understand the .as_dict() method was chosen because of the MSONable spec, but if there is a way to have a new object serialize like the existing ValueEnum without relying on a bolted on .as_dict()/.from_dict() method, that would certainly be great. This was the issue demonstrated via https://github.com/materialsproject/emmet/issues/914, but that is merely one of several instances. The issue is broader than CrystalSystem specifically.

Do we actually need specific enums for crystal system etc? Could we just use literal["triclinic", "monoclinic", ...] in the SymmetryData schema. E.g., here

Indeed, this is a separate but also important point. There are several times where there is no need to be using ValueEnum. This is likely one of those instances.