materialsproject / emmet

Be a master builder of databases of material properties. Avoid the Kragle.
https://materialsproject.github.io/emmet/
Other
51 stars 63 forks source link

Fix JSON serialization of `ValueEnum`s #944

Closed Andrew-S-Rosen closed 4 months ago

Andrew-S-Rosen commented 4 months ago

Closes #914.

from monty.serialization import dumpfn
from emmet.core.symmetry import CrystalSystem

sym = CrystalSystem("Triclinic")

dumpfn(sym, "test.json")

This now works.

Andrew-S-Rosen commented 4 months ago

Tagging @munrojm. Fixed it, I think! If you wouldn't mind minting a new version after this is merged, that would be a huge help.

Thank you @jmmshn for your suggestion. It worked like a charm! Not sure what I was doing wrong before.

jmmshn commented 4 months ago

I will never end the crusade against custom as_dict / from_dict

esoteric-ephemera commented 3 months ago

Hey I think this broke atomate2, see this atomate2 issue which I've reproduced with emmet>=0.78.0, and couldn't reproduce with emmet<=0.77.1

Andrew-S-Rosen commented 3 months ago

It's probably better to solve the root cause of the problem in atomate2 (or monty, if relevant) rather than patch back in a custom .as_dict() method that also doesn't serialize to a dictionary but rather a string.

utf commented 3 months ago

Maybe I'm missing something, but this PR changed the behaviour of these objects. That doesn't seem like an atomate2 issue as we were using these objects as intended.

The docstring specifies that ValueEnum should serialise to a string.

esoteric-ephemera commented 3 months ago

There's a jsanitize call in jobflow with strict=True passed that would require VaspObject to have an as_dict method, which was removed here. This appears as a bug in atomate2, so I agree with @Andrew-S-Rosen and @utf that the placement of a fix should change. I'll try to work out where that is best suited

utf commented 3 months ago

The point is that we want to serialise these objects as strings. That's why we were using ValueEnum.

I'd ask if you want to have enums that serialise as dicts then please just use regular Enums (now supported in monty).

Andrew-S-Rosen commented 3 months ago

I guess there are a few things to unpack here.

In emmet, and therefore in atomate2 and other packages, there are a few objects inheriting from ValueEnums. The example I mentioned in #944 is one example. When a dictionary or other object containing a CrystalSystem is dumped with dumpfn, an error is raised. That's because the custom .as_dict() method is returning a str. @jmmshn suggested removing the method, which was what was done in this PR.

The ValueEnum in emmet is, admittedly, advertised as a way to serialize the object to str. So, I can see the argument there for retaining the .as_dict() method to be a str return. However, the method is called .as_dict(). In my opinion, we should not have a method like .as_dict() returning something that is not a dictionary. This has downstream negative effects, as mentioned in the example above, and is a hack.

I suppose the question to ask is there a better way to achieve the desired str serialization without relying on such a hacky/fake .as_dict() method? I would argue the answer is yes --- inherit from Enum and take advantage of monty's enum_values=True flag.

Now that the situation is a bit clearer to me and we also see some of the downstream effects, perhaps the better approach would simply be to re-add the .as_dict() method but deprecate the entire ValueEnum class, although maybe that is considered to be even more drastic.

utf commented 3 months ago

I think this PR and issue #914 are misunderstanding what ValueEnum is for. There many objects in Python that cannot be serialised as json by themselves. For example, str, int, float, list etc. Instead, they have to be serialised as part of a dict. The fact that this does not work

from monty.serialization import dumpfn
from emmet.core.symmetry import CrystalSystem

sym = CrystalSystem("Triclinic")
dumpfn(sym, "test.json")

is not an issue in itself. E.g., the following code does not mean that string serialisation is broken:

dumpfn("Triclinic", "test.json")  # will fail

The goal of ValueEnum is to define a set of string literals in the form of an enum. When we serialise these in a task document, we want to only store the enum value. E.g.

{
    "crystal_system": "Triclinic"
}

rather than

{
    "crystal_system": {
        "@module": "emmet.core.utils",
        "@class": "ValueEnum",
        "value": "Triclinic",
    }
}

The latter makes it convoluted to query the task document easily.

Perhaps there is a better way to achieve this and I agree that the as_dict labelling could be confusing. However, the point is this code has been in production for several years and works absolutely fine. The suggestion of enabling enum_values=True is not a good fix in this case, since this will mean it isn't possible to store ANY enums in their dict form. ValueEnum combined with enum_values=False provides a mechanism to have two types of Enum serialisation behaviour.

To summarise, the issue raised in #914 is not a bug it is the intended behaviour as per the docstring and existing usage. This change has broken atomate2 and other downstream packages. Please can we revert it.