lovasoa / marshmallow_dataclass

Automatic generation of marshmallow schemas from dataclasses.
https://lovasoa.github.io/marshmallow_dataclass/html/marshmallow_dataclass.html
MIT License
456 stars 78 forks source link

Need a way for class_schema to deserialize to dictionary, to support gradual conversion of legacy code #275

Open ikriv opened 1 month ago

ikriv commented 1 month ago

I converted legacy code that used dictionaries to use dataclasses. However, the conversion is not 100%, so the same type is sometimes represented as a dataclass, and sometimes as a dictionary.

In the new code path, I use class_schema(MyType)().loads(input), it gives me the dataclass, and all is good. But in the old, not yet converted code paths, I either need to maintain legacy schema manually (bad), or I need a way to obtain an 'old style' marshmallow schema from clsas_schema(MyType) that produces a dictionary. I found only one reliable way to achieve that by using loads() and then dump():

schema = class_schema(MyType)
d = schema.dump(schema.loads(input))

Needless to say, this is cumbersome and inefficient. A better solution would be to have an option to provide a class schema that does not override load() of the base schema, something along the lines of:

dict_schema = class_schema(MyType, use_base_schema_load = True)

This would create a schema class that derives from the provided base schema (marshmallow.Schema by default), defines all necessary fields, but does not override load().

Example code before conversion:

# before conversion
class ConfigSchema(marshmallow.Schema):
    threshold = fields.Float(valid=Range(0,1), required=True)

def frequent_function(data):
   threshold = data["threshold"]
   # use threshold in some way
   ...

def rare_function(data):
   threshold = data["threshold"]
   # use threshold in some other way

data = ConfigSchema().loads('{"threshold": 0.2}')
frequent_function(data)
rare_function(data)

Then we convert frequent_function() to use dataclass, but we leave rare_function() as is:

# after partial conversion

@dataclass
class Config:
    threshold: float = field(metadata={"validate": Range(0,1)})

def frequent_function(data: Config) -> None:
   threshold = data.threshold
   # use threshold in some way
   ...

def rare_function(data: Dict[str, Any]):
   threshold = data["threshold"]
  # use threshold in some other way
   ...

input = '{"threshold": 0.2}'
schema = class_schema(Config)
frequent_function(schema.loads(input))
rare_function(schema.dump(schema.loads(input)) # convoluted

PROPOSAL:

Let's have


input = '{"threshold": 0.2}'
schema = class_schema(Config)
dict_schema = class_schema(Config, use_base_schema_load = True)
frequent_function(schema.loads(input))
rare_function(dic_schema.loads(input)) # better

I studied the code and I am ready to provide a pull request if you are on board with the idea.
dairiki commented 4 weeks ago

Would the use of dataclasses.asdict help in your case?

dairiki commented 4 weeks ago

Related, possible dup: #213

ikriv commented 3 weeks ago

Indeed, asdict would probably work even better than deserialization, because it will keep the type conversions, e.g. datetime will remain datetime, and not a string. Not sure why I missed it, thanks for the idea!

But still, we would first deserialize to dataclass and then convert to dict, which is clearly not ideal for performance, especially for large data graphs.

And indeed, my request looks very similar to #213, thanks for checking! Do you recall what was the reason for not merging #213?