marshmallow-code / marshmallow-sqlalchemy

SQLAlchemy integration with marshmallow
https://marshmallow-sqlalchemy.readthedocs.io
MIT License
549 stars 94 forks source link

Deserializing nested json #588

Open CarolusMagnuss opened 1 month ago

CarolusMagnuss commented 1 month ago

I have trouble setting up the Schemas to properly deserialize my sqlalchemy models: I have a simpl one to many relationship between study and redcap fields, I set up my Schema like this

class RedCapFieldSchema(marsh.SQLAlchemyAutoSchema):
    Con_Result = fields.Str(dump_only=True, dump_default=None, allow_none=True)
    transfer = fields.Boolean(dump_only=True, default=True)

    class Meta:
        model = RedcapField
        exclude = ('id',)
        unknown = EXCLUDE
        load_instance = True

class StudySchema(marsh.SQLAlchemyAutoSchema):
    redcap_fields = Nested(RedCapFieldSchema, many=True)

    class Meta:
        model = Study
        exclude = ('id',)
        load_instance = True

    redcap_fields = Nested(RedCapFieldSchema, many=True)

This produces my desired json format, but when i want to load such a json-file the schema fails to create a study instance, because it tries to pass redcap_fields into the study init which it doesnt know.

I kinda expected that the schemas would create all the RedcapField instances from the nested data (which following the debug thread seems to happen) and then simply append them to the study instance on the redcap_fields property( a sqlalchemy relationship). So i have 2 questions:

  1. What is the best practice to serialize relationships so that you can easily deserialize them?
  2. What do i need to do, to makes this nested json deserialize into the sqlalchemy models?
CarolusMagnuss commented 1 month ago

I managed to answer question 2 by altering my code like this:

class RedCapFieldSchema(marsh.SQLAlchemyAutoSchema):
    Con_Result = fields.Str(dump_only=True, dump_default=None, allow_none=True)
    transfer = fields.Boolean(dump_only=True, default=True)

    class Meta:
        model = RedcapField
        exclude = ('id',)
        unknown = EXCLUDE
        load_instance = True

class StudySchema(marsh.SQLAlchemyAutoSchema):
    redcap_fields = Nested(RedCapFieldSchema, many=True)

    class Meta:
        model = Study
        exclude = ('id',)
        load_instance = True

    @pre_load
    def pop_redcapfields(self, data, **kwargs):
        self.original_data = data # save the original data into a object-attribute
        return {'title': data['title']} # only return a single dictionary with the title because the title is the only parameter in the study__init__, passing more prompts errors

    @post_load
    def make_study(self, data, **kwargs):
        study = Study(title=self.original_data["title"]) # making a Study instance with the title attribute
        rc_field_data = self.original_data.pop("redcap_fields") # pop the redcap fields into a separate variable
        for key, value in self.original_data.items():  # set the attributes of the study to the corresponding values in the dictionary
            if key in ('start', 'end') and isinstance(value, str): # i have 2 datetime fields where the values need to be converted to datetime
                value = datetime.strptime(value, '%Y-%m-%d').date()
            setattr(study, key, value)
        for field in rc_field_data:
            rc_schema = RedCapFieldSchema() 
            redcap_field = rc_schema.load(field) # strangely enough this instantiates a redcap_field without issue, probably because redcap_field has no explicit __init__  
            self.session.add(redcap_field)  # if you dont add it to the session the fields are missing
            study.redcap_fields.append(redcap_field) 
        return study

This works now, but seems to be overly complicated, i kind of expected the library to handle the nesting, the specific init or at least the conversion to datetime. Very much interested in a more concise solution.