guilt / colfer-python

colf: A strong typed version of Colfer serialization/deserialization for Python.
https://pypi.org/project/colf/
Other
2 stars 2 forks source link

Use Pydantic for model #9

Open namngh opened 1 year ago

namngh commented 1 year ago

I suggest replace colf_base with Pydantic. Here is my fork namngh/colfer-python

pros:

cons:

guilt commented 1 year ago

This is great, have you been able to run the tox tests? (The OG version also runs on Python 2.7, long before it got deprecated).

guilt commented 1 year ago

If this works, may close #5. One of the things it needs to do is pass all the tests.

The portion of this project that needed to get implemented was how to support nested types better, and to test against the golden tests of the actual colfer project. If those are done, this can be integrated into the colfer project.

I guess, if we are dropping 2.7 at this point, we should be okay. Will cut a tag so Python 2.7 remains unsupported in the newer versions.

Supporting all the types is necessary to be in-line with the Golang version. I just want to have wire-for-wire Canaries working with that project. That helps us improve confidence levels when integrating it with the colfer project.

@pascaldekloe would be happy if we can get this stuff working with the golden tests.

namngh commented 1 year ago

This is great, have you been able to run the tox tests? (The OG version also runs on Python 2.7, long before it got deprecated).

Let's me change the test case, then I will notice you when I've done. Unfortunately, it may take few days as I've just covered some happy cases so far.

pascaldekloe commented 1 year ago

Oversized integers create an error scenario—the range check—for the marshal operation, which would otherwise be error free. Also, values take a bit more memory than required. Not ideal but acceptable, no?

Float32 conversion may require a bit of cleverness. 🤓

Since we have 64-bit integers, why not implement timestamps with a simple nano-second int? Flexible, correct and error free. 😇

namngh commented 1 year ago

Oversized integers create an error scenario—the range check—for the marshal operation, which would otherwise be error free. Also, values take a bit more memory than required. Not ideal but acceptable, no?

I've just implemented Int32 for Pydantic with actually convert int to 4 bytes (type check out-of-bound) which may save some bytes.

import math

class Int32(bytes):

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def __modify_schema__(cls, field_schema):
        pass

    @classmethod
    def validate(cls, v):
        if not isinstance(v, int) and not isinstance(v, bytes):
            raise TypeError('must be int or bytes')

        if isinstance(v, bytes):
            if len(v) > 4:
                raise ValueError('convert out-of-bound')

        if isinstance(v, int):
            byte_length = math.ceil(v.bit_length() / 8.0)
            if byte_length > 4:
                raise ValueError('convert out-of-bound')

            v = v.to_bytes(4, 'little')

        return cls(v)

    def __repr__(self):
        return f'Int32({super().__repr__()})'
from pydantic_missing_type import int32
from pydantic import BaseModel

class TestModel(BaseModel):
    number: int32.Int32

model = TestModel(number=32767)

print(model.number)  # b'\xff\x7f\x00\x00'

Since we have 64-bit integers, why not implement timestamps with a simple nano-second int? Flexible, correct and error free. innocent

Sure, I can implement Timestamps class that convert datetime object to int, but it may not suit for date and time object.

pascaldekloe commented 1 year ago

Clever trick. 🙂 Allocating an array of each integer will be super slow though.

Can speed up byte count with (v.bit_length() + 7) / 8 instead of math.ceil.