inveniosoftware / flask-resources

REST APIs for Flask
https://flask-resources.readthedocs.io
MIT License
3 stars 21 forks source link

Refactor serialiser/deserializer interface #117

Open lnielsen opened 1 year ago

lnielsen commented 1 year ago

Problem

The current serialiser interface has a couple of issues related to clear responsibilities/naming:

Design

Serializer A serializer at a conceptual level takes as input an result item and outputs one of 1) bytes, 2) str or 3) etree. A deserialiser should conceptually do the reverse.

A serializer is composed of:

class BaseSerializer:
    def __init__(self, formatter=...)

    def serialize_bytes(self, item):
        self.formatter.to_bytes(self.dump_obj(item))

    def serialize_bytes_list(self, item_list):
        self.formatter.to_bytes_list(self.dump_ob_list(item_list))

    def serialize_str(self, item):
        self.formatter.to_str(self.dump_obj(item))

    def serialize_str_list(self, item_list):
        self.formatter.to_bytes_list(self.dump_ob_list(item_list))

    def serialize_etree(self, item):
        self.formatter.to_etree(self.dump_obj(item))

    def serialize_etree_list(self, item_list):
        self.formatter.to_etree_list(self.dump_obj_list(item_list))

    def dump_obj(self, list):
        # Same as MarshmallowSerializer today

    def dump_obj_list(self, list):
        # Same as MarshmallowSerializer today

A deserialiser should conceptually do the same in reverse:

class BaseDeserializer:
    def deserialize_bytes(self, data_bytes):
        # ...

    def deserialize_bytes_list(self, data_bytes):
        # ...

    def deserialize_str(self, data_str):
        # ...

    def deserialize_str_list(self, data_str):
        # ...

    def deserializer_etree(self, data_etree):
        # ...skip this one for now as I don't think we use it - i.e. raise NotImplementedError

    def deserializer_etree_list(self, data_etree):
        # ...

    def load_obj(self, dumped_obj):
        # ...

    def load_obj_list(self, dumped_obj_list):
        # ...

Transformer

The serializer could for now do the transform step (step 1) itself. If we need something else than Marshmallow (say e.g. dojson) for transformation we could consider extracted as a dedicated object similar to the formatter below. The transformer is conceptually the two methods (dump_obj/load_obj).

Formatter

A serializer should delegate the formatting step to a dedicated object responsible to step 2. Formatters are responsible for taking the transformed result item and output bytes, str or etree (if possible for the formatter).

class Formatter:
    def to_bytes(self, dumped_item):
        raise NotImplementedError

    def to_bytes_list(self, dumped_item_list):
        raise NotImplementedError

    def to_str(self, dumped_item):
        raise NotImplementedError

    def to_str_list(self, dumped_item):
        raise NotImplementedError

    def to_etree(self, dumped_item):
        raise NotImplementedError

    def to_etree_list(self, dumped_item):
        raise NotImplementedError

The we should likely have two formatters:

class JSONFormatter:
    def _encoder()
        #...

    def to_bytes(self, dumped_item):
        return self.to_str(dumped_item).encode('utf8')

    def to_bytes_list(self, dumped_item_list):
        # ...

    def to_str(self, dumped_item):
        return json.dumps(dumped_item, self._encoder(), ...)

    def to_str_list(self, dumped_item):
        raise NotImplementedError

class LXMLFormatter:
    def __init__(self, etree_dumper=..., etree_options={...}):
        #...

    def to_bytes(self, dumped_item):
        return self.to_str(dumped_item).encode('utf8')

    def to_bytes_list(self, dumped_item_list):
        # ...

    def to_str(self, dumped_item):
        return json.dumps(dumped_item, self._encoder(), ...)

    def to_str_list(self, dumped_item):
        raise NotImplementedError

Usage

You can now compose new serializers:

datacite43json = Serializer(
    object_schema_cls=DataCite43Schema,
    list_schema_cls=BaseListSchema,
    formatter=JSONFormatter()
)
datacite43xml = Serializer(
    object_schema_cls=DataCite43Schema,
    list_schema_cls=None,
    formatter=LXMLFormatter(
        dump_etree=schema43.dump_etree, 
        etree_options=dict(pretty_print=True, xml_declaration=True, encoding='utf-8')
    )
)
dublincorexml = Serializer(
    object_schema_cls=DublinCoreSchema,
    list_schema_cls=None,
    formatter=LXMLFormatter(
        dump_etree=simpledc.dump_etree, 
    )
)
dublincorexml = Serializer(
    object_schema_cls=DublinCoreSchema,
    list_schema_cls=None,
    formatter=LXMLFormatter(dump_etree=simpledc.dump_etree)
)
dcatxml = Serializer(
    object_schema_cls=DataCite43Schema,
    list_schema_cls=None,
    formatter=LXMLFormatter(
        dump_etree=apply_xslt(
            schema43.dump_etree, 
            "invenio_rdm_records.resources.serializers", 
            "dcat/datacite-to-dcat-ap.xsl")
        )
    )
)
from dojson.contrib.to_marc21.utils import dumps_etree
marcxml = Serializer(
    object_schema_cls=MARCXMLSchema,
    list_schema_cls=None,
    formatter=LXMLFormatter(
        dump_etree=dumps_etree
    )
)
bibtex = Serializer(
    object_schema_cls=BibTeXSchema,
    list_schema_cls=None,
    formatter=BibTeXFormatter()
)
geojson = Serializer(
    object_schema_cls=GeoJSONSchema,
    formatter=JSONFormatter()
)
csljson = Serializer(
    object_schema_cls=CSLJSONSchema,
    formatter=JSONFormatter()
)
citationstr = Serializer(
    object_schema_cls=CSLJSONSchema,
    formatter=CitationStringFormatter(url_args_retriever=...)
)

You can now compose new deserializers as well:

rocrate = Deserializer(
    schema=ROCrateSchema,
)

Additional issues

Unresolved questions