sciexp / flytezen

🪽hydra-zen configuration of flyte workflows 🧘
GNU Affero General Public License v3.0
2 stars 0 forks source link

enable JSON serialization of dataclasses produced by hydra-zen `builds` #5

Open cameronraysmith opened 9 months ago

cameronraysmith commented 9 months ago

flytekit's dataclass transformer requires JSON-serializable dataclasses (docs). Currently, we construct JSON-serializable dataclasses for arguments or whole function interfaces along the lines of the pseudocode

from datclasses import make_dataclass
from dataclasses_json import DataClassJsonMixin
# OR
# from mashumaro.mixins.json import DataClassJSONMixin
from hydra_zen import builds
from sklearn.linear_model import LogisticRegression

# ...
# passing DataClassJsonMixin to bases via make_dataclass
logistic_regression_fields = create_dataclass_from_callable(
    LogisticRegression, custom_types_defaults
)

LRI_DataClass = make_dataclass(
    "LRI_DataClass",
    logistic_regression_fields,
    bases=(DataClassJsonMixin,),
)
LRI_DataClass.__module__ = __name__

approximating the usage in the logistic regression example, which, accounting for the dependency on

https://github.com/sciexp/flytezen/blob/526203da8fb3bc543dc3546fc9bf1056c1216f07/src/flytezen/configuration.py#L36-L117

create_dataclass_from_callable, is verbose, even though it behaves as expected.

Based on the documentation for hydra_zen.builds zen_dataclass argument it seems like it should be possible to use the dataclasses constructed by hydra-zen instead

# passing DataClassJsonMixin to bases via zen_dataclass
Builds_LRI = builds(
    LogisticRegression,
    populate_full_signature=True,
    dataclass_name="Builds_LRI",
    zen_dataclass={"bases": (DataClassJsonMixin,), "module": __name__},
)

and eliminate create_dataclass_from_callable from flytezen altogether, but this produces

TypeError: dataclass option `bases` must be a tuple of dataclass types
rsokl commented 8 months ago

Sorry for lurking but I just came across this!

I think you can make inheritable via builds by making a dataclass-typed subclass of it

from dataclasses import dataclass

from dataclasses_json import DataClassJsonMixin as _DataClassJsonMixin

@dataclass
class DataClassJsonMixin(_DataClassJsonMixin): pass

then you can pass DataClassJsonMixin into bases

cameronraysmith commented 8 months ago

Sorry for lurking but I just came across this!

Not at all! Thank you for chiming in. I would have posted this in a discussion on the hydra-zen repository, but had not yet invested enough time to understand what I was missing, so I logged it here for future work.

I think you can make inheritable via builds by making a dataclass-typed subclass of it

from dataclasses import dataclass

from dataclasses_json import DataClassJsonMixin as _DataClassJsonMixin

@dataclass
class DataClassJsonMixin(_DataClassJsonMixin): pass

then you can pass DataClassJsonMixin into bases

That sounds like it should essentially lead to the resolution of this issue. Thanks again!