Open gpetretto opened 1 month ago
Sorry, I forgot to mention that the behaviour can be made uniform if selecting enum_values=True
in jsanitize
, but
1) this still leaves the inconsistency between the objects
2) does not solve the error when the MontyEncoder is used.
There has already been some discussion on related issues:
I advised on not removing as_dict
at the time since it will break downstream packages. I think this summarises my feelings: https://github.com/materialsproject/emmet/pull/944#issuecomment-2024859332
Personally, I think the fix is to add back as_dict
. It has been working fine for many years.
That said, I feel like I'm fighting a losing battle. If moving over to the emmet ValueEnum and using enum_values=True
in jsanitize doesn't cause any regressions, I'm happy to go with it.
@janosh would you have any time to look into this?
I see, I apologize for not searching further for previous discussions. I had found the last PR and assumed it was just a recent update.
Just to give a bit more context, I got to this point, since a user reported having the TypeError: 'str' object does not support item assignment
error message in jobflow-remote. Not knowing of this issue, it was annoying to track down and is probably hard to fix in jobflow-remote itself. The reason is that I need to serialize to a json file the replace/addition/detour
in the Response
on the remote worker and reconstruct it properly in the Runner to apply it. The user that reported the issue is passing part of the output as an argument to the addition
, and this contains subclasses of ValueEnum
. The solution might be to make sure that no such object is present there, but this should be an ad-hoc solution for jobflow-remote and I would have preferred to find a general fix.
I should add that I have also realized that enum_values
is always set to True when inserting the output into the store (https://github.com/materialsproject/jobflow/blob/053451721600751aaeda7eca0586635163f7a6f5/src/jobflow/core/job.py#L653), so is there even a point in having the objects in the output document being ValueEnum
? Whether it is a ValueEnum
or a simple Enum
it will always be converted to an Enum when stored by jobflow. Am I wrong? Are there other points where being a ValueEnum
makes a difference?
In general, I understand the idea behind the ValueEnum
, but I am worried about breaking the MSONable API, as it can lead to unpredictable behaviours, as in this case. Maybe if this a relevant feature to have, the ValueEnum
could be moved to monty and be explicitly supported in the serialization process. This would remove the ambiguity on different ValueEnum
present in different packages and prevent unexpected errors. But then I suppose the enum_values=True
should be removed from the jobflow serialization to actually take advantage of this feature.
No need to appologise. I didn't expect you to be familiar with the previous discussions, I was just putting them for the complete history of this problem.
Thanks for digging into this @gpetretto. I think one can imagine a use case for having both ValueEnum (e.g., something which serialises to a string, can be compared to a string, but which is an enum object) vs a normal Enum (something which is MSONable in that you can recover the original object) be present in jobflow. I thought this is what we had but I'm now lost with the changes to monty/emmet/jobflow as to what is still possible. There doesn't seem to be consistent use of enum_values
in jobflow (e.g., https://github.com/materialsproject/jobflow/blob/053451721600751aaeda7eca0586635163f7a6f5/src/jobflow/core/store.py#L300) but I suppose even one call with it enabled means regular Enums haven't been possible for a while.
In that case, I'm happy to switch over emmet ValueEnum
and enable enum_values=True
everywhere.
@janosh would you have any time to look into this?
@utf sorry, not immediately. would have to be inserted fairly low on the todo list.
I think one can imagine a use case for having both ValueEnum (e.g., something which serialises to a string, can be compared to a string, but which is an enum object)
@utf that already exists actually. big fan of standard lib's StrEnum
!
@gpetretto been meaning to recommend to switch jf-remote
to that actually. not just because it auto-serializes to a normal string but also because you don't have to litter the code with Enum.<key>.value
. Enum.<key>
already yields the string value.
import json
from enum import StrEnum
class State(StrEnum):
completed = "completed"
failed = "failed"
waiting = "waiting"
submitted = "submitted"
running = "running"
assert State.completed == "completed" # no need for .value
assert json.dumps(State.completed) == '"completed"' # no need for special serialization logic
only downside is it's 3.11+. but there's a backport package on pypi or you could just copy-paste the StrEnum
class into your code as I did in pymatviz
I have noticed that the behaviour of
ValueEnum
, whose use is widespread in atomate2 for all the codes, is incosistent between emme-core and jobflow. The former now does not have an as_dict that returns a string, while the latter still has that. TheTaskState
is aValueEnum
for all the codes, but for VASP, it is the emmet version, for most of the other is the jobflow version.The code below demonstrates the potential consequences of the problem:
When using the latest version of emmet-core and jobflow the output is:
Where the two lead to different results. Trying to use
dumpfn
(that relies on the MontyEncoder) will be even worse, since the
dumpfn(p, "p.json")` leads to the error:Which is the same reported in https://github.com/materialsproject/emmet/pull/1051
In general, I would probably support the option of not having the
as_dict
method return a string, as this breaks theMSONable
API. Maybe just switching all to the emmet version of the object, if it should stay like this for jobflow? However I am not sure about all the implications of dropping that method in jobflow as well. In any case, this will change the structure of the data in the DB.@utf @janosh