turicas / rows

A common, beautiful interface to tabular data, no matter the format
GNU Lesser General Public License v3.0
869 stars 134 forks source link

Create Schema class #344

Open turicas opened 4 years ago

turicas commented 4 years ago

Table.fields is currently an OrderedDict, but we can't make some operations off of an OrderedDict, like deserializing an entire row. We must create a class Schema, set it to Table.schema and use it internally whenever possible (like in pgimport and all functions that are currently calling load_schema - they should receive a Schema object instead).

I've created a basic implementation to start:

rows.fields:

from .utils import load_schema

class Schema:
    @classmethod
    def from_file(cls, filename):
        obj = cls()
        obj.filename = filename
        obj.fields = load_schema(
            str(filename)
        )
        return obj

    def deserialize(self, row):
        field_names = list(row.keys())
        field_mapping = {
            old: self.fields[new]
            for old, new in zip(field_names, make_header(field_names))
        }
        return {
            key: field_mapping[key].deserialize(value) for key, value in row.items()
        }

We may need to check for schema serialization formats, such as datapackage, avro, protocol buffers etc. so we can serialize/deserialize these schema definitions to these formats (and maybe also SQL, at least for serialization, using postgresql and sqlite plugins' code).