The current serialiser interface has a couple of issues related to clear responsibilities/naming:
JSONSerializer and MarshmallowSerialzer for instance both inherit BaseSerializer but JSONSerializer is injected
Some serializer output bytes other str, this means lack of consistency in if you have to call encode/decode (which later means serializers/deserializer can't easily be interchanged)
Deserialiser uses mixins and wasn't changed when the serializer mixin was changed to a base class.
Design
Serializer
A serializer at a conceptual level takes as input an result item and outputs one of 1) bytes, 2) str or 3) etree. A deserialiser should conceptually do the reverse.
A serializer is composed of:
A transformation step - takes as input a result item dict, and outputs a transformed result item dict
A formatting step - takes a as input a transformed result item dict, and outputs a one of the three
class BaseSerializer:
def __init__(self, formatter=...)
def serialize_bytes(self, item):
self.formatter.to_bytes(self.dump_obj(item))
def serialize_bytes_list(self, item_list):
self.formatter.to_bytes_list(self.dump_ob_list(item_list))
def serialize_str(self, item):
self.formatter.to_str(self.dump_obj(item))
def serialize_str_list(self, item_list):
self.formatter.to_bytes_list(self.dump_ob_list(item_list))
def serialize_etree(self, item):
self.formatter.to_etree(self.dump_obj(item))
def serialize_etree_list(self, item_list):
self.formatter.to_etree_list(self.dump_obj_list(item_list))
def dump_obj(self, list):
# Same as MarshmallowSerializer today
def dump_obj_list(self, list):
# Same as MarshmallowSerializer today
A deserialiser should conceptually do the same in reverse:
class BaseDeserializer:
def deserialize_bytes(self, data_bytes):
# ...
def deserialize_bytes_list(self, data_bytes):
# ...
def deserialize_str(self, data_str):
# ...
def deserialize_str_list(self, data_str):
# ...
def deserializer_etree(self, data_etree):
# ...skip this one for now as I don't think we use it - i.e. raise NotImplementedError
def deserializer_etree_list(self, data_etree):
# ...
def load_obj(self, dumped_obj):
# ...
def load_obj_list(self, dumped_obj_list):
# ...
Transformer
The serializer could for now do the transform step (step 1) itself. If we need something else than Marshmallow (say e.g. dojson) for transformation we could consider extracted as a dedicated object similar to the formatter below. The transformer is conceptually the two methods (dump_obj/load_obj).
Formatter
A serializer should delegate the formatting step to a dedicated object responsible to step 2. Formatters are responsible for taking the transformed result item and output bytes, str or etree (if possible for the formatter).
[ ] XML formats with list results. Only MARCXML can produce an XML list result, but most other formats cannot. DataCite XML for instance does not have an XML schema to validate list result (OAI DataCite XML might have a schema for list results.
[ ] Fix usage of serializers in the OAIServer. E.g. now they can directly dump an etree if the formatter supports it return DublinCoreXMLSerializer.serialize_etree(item)
Unresolved questions
Perhaps Formatter should be renamed to Writer so that the deserializer/serializer has consistent naming.
Problem
The current serialiser interface has a couple of issues related to clear responsibilities/naming:
JSONSerializer
andMarshmallowSerialzer
for instance both inheritBaseSerializer
butJSONSerializer
is injectedDesign
Serializer A serializer at a conceptual level takes as input an result item and outputs one of 1) bytes, 2) str or 3) etree. A deserialiser should conceptually do the reverse.
A serializer is composed of:
A deserialiser should conceptually do the same in reverse:
Transformer
The serializer could for now do the transform step (step 1) itself. If we need something else than Marshmallow (say e.g. dojson) for transformation we could consider extracted as a dedicated object similar to the formatter below. The transformer is conceptually the two methods (dump_obj/load_obj).
Formatter
A serializer should delegate the formatting step to a dedicated object responsible to step 2. Formatters are responsible for taking the transformed result item and output bytes, str or etree (if possible for the formatter).
The we should likely have two formatters:
Usage
You can now compose new serializers:
You can now compose new deserializers as well:
Additional issues
return DublinCoreXMLSerializer.serialize_etree(item)
Unresolved questions
Formatter
should be renamed toWriter
so that the deserializer/serializer has consistent naming.