Open dekkers opened 1 year ago
Interesting, that blog post makes a persuasive case for dataclasses. Their syntax (both with and without slots) is pretty clean as well - better than Pydantic or regular classes. It's another reason why we should consider upgrading to a higher version (https://github.com/minvws/nl-kat-coordination/issues/627). Dropping support for <3.10 altogether would save us work, and I think it's becoming more and more interesting to consider providing/requiring a native Python 3.11 environment for Debian and Ubuntu.
I have experienced the performance hits of pydantic as well. I must say that we were working with a large amount of objects though and we experienced the overhead in writing these to a large json file.
We concluded the same thing at the time: why use Pydantic after initial validation has been performed? So yes I would be all for such a convention.
Having said that, regarding slots I would first want to see that using them actually has significant impact on performance before deciding to implement them everywhere.
+1 for switching to data classes. For our container images this is another reason to update to Python 3.10+ (#627).
Having said that, regarding slots I would first want to see that using them actually has significant impact on performance before deciding to implement them everywhere.
I’d say that the tests in the linked post make a convincing argument that slots make a big difference, primarily in memory usage.
@praseodym I see and I must say that my opinion also depends on the impact/complexity of the implementation as well. (Quick wins are always great.) But every time I do optimisations I learn not to assume I know the bottlenecks of my workload or impact of certain changes without profiling.
“Pydantic V2 is 5-50x faster than Pydantic V1”, according to https://docs.pydantic.dev/latest/blog/pydantic-v2-alpha/. But I’m not sure if we should wait for it, or switch to data classes before the V2 release.
Meeting notes (23-05-2023):
In the future, we will upgrade our codebase to the highest possible version (e.g. pyupgrade) to eliminate tech debt.
We will leave the data classes optimization for later, as we have easier performance improvements we can make. Pydantic should not be used as internal data container #738
Pydantic primary purpose is data validation. This is great for things like parsing and validating JSON, but we don't need to do data validation for internal data structures and pydantic will have an unnecessary overhead.
This is a good post that describes different Python data containers and shows the overhead of pydantic:
https://towardsdatascience.com/battle-of-the-data-containers-which-python-typed-structure-is-the-best-6d28fde824e
For mutable internal data containers the best option would be to use dataclasses with slots. Given that the slots are only supported since Python 3.10 we should probably define our own dataclass decorator that will use slots on Python 3.10 and higher and fallback to dataclass without slots on earlier Python versions.
If the data container can be immutable then namedtuple is the best option.