narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!
https://narwhals-dev.github.io/narwhals/
MIT License
556 stars 88 forks source link

[Enh]: Consider adding `__slots__` #500

Open FBruzzesi opened 3 months ago

FBruzzesi commented 3 months ago

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

n.a.

Please describe the purpose of the new feature or describe the problem to solve.

To keep the overhead introduced by narwhals objects as low as possible (both in memory and access speed), we could consider using __slots__ since:

If for some use case a library needs to dynamically add attributes to a narwhals object, we can suggest to inherit from the original narwhals implementation.

Suggest a solution if possible.

No response

If you have tried alternatives, please describe them below.

No response

Additional information that may help us understand your needs.

Hard to figure out how to benchmark this

FBruzzesi commented 3 months ago

Added this locally and diff is less than expected. All tests are passing. If there is interest I will open a PR

MarcoGorelli commented 3 months ago

sure - I don't really know what it does but if there's no regression and it improves things, sounds like a good idea

mikeweltevrede commented 1 month ago

Hi all! I saw this issue, which reminded me of some investigation I did myself for my team a few months back. Our conclusion was that __slots__ really only make sense if you expect to create many instances of a class in the same code (e.g. you would create many Car objects in a simulation of traffic flow). Otherwise, it introduces additional overhead (e.g. with child classes) and complexity. Slots are not a concept that many people are familiar with and also blocks access of the __dict__ attribute, which does have its uses. In the end, we concluded that using __slots__ was not worth all the trouble.

FBruzzesi commented 1 month ago

Hey @mikeweltevrede thanks for your input. This is very valuable.

My main concern is regarding both creating new instances and accessing attributes - as the layer Narwhals adds one or more attributes access for each method, and returns a new instance most of the time.

While in most cases this overhead is neglectable, I would like to make sure that it is as low as possible and it does not add up to something significant in cases of a complex series of steps.