green-code-initiative / ecoCode-challenge

Emboard in the hackhatons serie for improving ecoCode
3 stars 4 forks source link

[Python] __slots__ shoud be declared on data classes (team 904000m2) #32

Open SeBii91 opened 1 year ago

SeBii91 commented 1 year ago

Extract from https://wiki.python.org/moin/UsingSlots

The slots declaration allows us to explicitly declare data members, causes Python to reserve space for them in memory, and prevents the creation of dict and weakref attributes. It also prevents the creation of any variables that aren't declared in slots.

Why Use __slots__? The short answer is slots are more efficient in terms of memory space and speed of access, and a bit safer than the default Python method of data access.

HiitCat commented 1 year ago

Optimized API: Use slots in class definitions

Platform

OS OS version Langage
- - Python 3

Main caracteristics

ID Title Category Sub-category
EOPT001 Use __slots__ in class definitions Environment Optimized API

Severity / Remediation Cost

Severity Remediation Cost
Minor Minor
Severity Remediation Cost
Minor Minor

Rule short description

Rule complete description

Text

Python is an interpreted, high-level, general-purpose programming language. One of its main advantages is its flexibility and dynamic nature. However, this flexibility comes at a cost in terms of performance, especially in memory usage. One way to optimize memory usage is by using the __slots__ attribute in class definitions.

By default, when we define a class in Python, it uses a __dict__ to store its attributes and methods. This means that every instance of the class has a dictionary to store its data, which can be quite memory-intensive.

By using __slots__, we can explicitly declare the data members of the class, allowing Python to allocate memory for each instance of the class more efficiently. When we use __slots__, Python creates a tuple to store the attributes of the class instead of a dictionary, reducing memory usage.

This optimization is especially important for classes that are going to be instantiated multiple times or have a large number of instances, as it can significantly reduce memory consumption. It is also useful when working with large datasets or building memory-intensive applications.

In the case of classes with the @dataclass decorator, which automatically generates boilerplate code for classes, we can also use __slots__ to optimize memory usage. By default, @dataclass uses the same __dict__ approach for attribute storage. However, by setting the slots parameter to True, we can use the tuple-based attribute storage and reduce memory usage.

Here is the default implementation of the @dataclass decorator:

@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)

Using __slots__ can be an effective way to optimize memory usage in Python classes and improve the performance of memory-intensive applications. It is a minor remediation cost to add __slots__ in class definitions and can have significant benefits in terms of memory efficiency.

# Non-compliant
class MyClass:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

# Compliant
class MySlottedClass:
    __slots__ = ('a', 'b', 'c')
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

# Compliant
@dataclass(slots=True)
class MyDataClassSlottedClass:
    a: int
    b: int
    c: int

HTML

<p>
    Python is an interpreted, high-level, general-purpose programming language. One of its main advantages is its flexibility and dynamic nature. However, this flexibility comes at a cost in terms of performance, especially in memory usage. One way to optimize memory usage is by using the <code>__slots__</code> attribute in class definitions.
</p>

<p>
    By default, when we define a class in Python, it uses a <code>__dict__</code> to store its attributes and methods. This means that every instance of the class has a <strong>dictionary</strong> to store its data, which can be quite memory-intensive.
</p>

<p>
    By using <code>__slots__</code>, we can explicitly declare the data members of the class, allowing Python to allocate memory for each instance of the class more efficiently. When we use <code>__slots__</code>, Python creates a tuple to store the attributes of the class instead of a dictionary, reducing memory usage.
</p>

<p>
    This optimization is especially important for classes that are going to be instantiated multiple times or have a large number of instances, as it can significantly reduce memory consumption. It is also useful when working with large datasets or building memory-intensive applications.
</p>

<p>
    In the case of classes with the <code>@dataclass</code> decorator, which automatically generates boilerplate code for classes, we can also use <code>__slots__</code> to optimize memory usage. By default, <code>@dataclass</code> uses the same <code>__dict__</code> approach for attribute storage. However, by setting the <code>slots</code> parameter to <code>True</code>, we can use the tuple-based attribute storage and reduce memory usage.
</p>

<p>
    Here is the default implementation of the <code>@dataclass</code> decorator:
</p>

<pre>
    <code>
        @dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)
    </code>
</pre>

<p>
    Using <code>__slots__</code> can be an effective way to optimize memory usage in Python classes and improve the performance of memory-intensive applications. It is a minor remediation cost to add <code>__slots__</code> in class definitions and can have significant benefits in terms of memory efficiency.
</p>

<pre>
    <code>
        # Non-compliant
        class MyClass:
            def __init__(self, a, b, c):
                self.a = a
                self.b = b
                self.c = c

        # Compliant
        class MySlottedClass:
            __slots__ = ('a', 'b', 'c')
            def __init__(self, a, b, c):
                self.a = a
                self.b = b
                self.c = c

        # Compliant
        @dataclass(slots=True)
        class MyDataClassSlottedClass:
            a: int
            b: int
            c: int
    </code>
</pre>

Implementation principle

References

POC

import sys
import time
from dataclasses import dataclass

# Non-compliant
class MyClass:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

# Compliant
class MySlottedClass:
    __slots__ = ('a', 'b', 'c')
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

# Compliant
@dataclass(slots=True)
class MyDataClassSlottedClass:
    a: int
    b: int
    c: int

def run_instanciations(n, classType):
    for i in range(n):
        classType(i, i+1, i+2)

if __name__ == '__main__':
    n = 50000000

    # Get the version from the command line
    version = sys.argv[1]

    # Switch on the version
    if version == 'v1':
        classType = MyClass
    elif version == 'v2':
        classType = MySlottedClass
    elif version == 'v3':
        classType = MyDataClassSlottedClass
    else:
        raise ValueError(f'Unknown version {version}')

    # Run the instanciations
    start = time.time()
    run_instanciations(n, classType)
    end = time.time()

    # Print the time
    print(f'Ran in {end-start:.2f} seconds with {n} instanciations')

The results are as follows:

benchmark results

vincent@vincent-HP-ProBook-450-G7:~/dev/poub$ vjoule --no-gpu python3 joules_slots.py v1 # Non compliant class
Ran in 15.33 seconds with 50000000 instanciations
CGroup CPU RAM
Global 230.80J 13.23J
Process 136.08J 0.14J
vincent@vincent-HP-ProBook-450-G7:~/dev/poub$ vjoule --no-gpu python3 joules_slots.py v2 # Compliant class with slots
Ran in 12.01 seconds with 50000000 instanciations
CGroup CPU RAM
Global 176.98J 10.25J
Process 107.34J 0.11J
vincent@vincent-HP-ProBook-450-G7:~/dev/poub$ vjoule --no-gpu python3 joules_slots.py v3 # Compliant dataclass with slots
Ran in 11.71 seconds with 50000000 instanciations
CGroup CPU RAM
Global 174.60J 10.19J
Process 102.00J 0.10J
jhertout commented 1 year ago

This rule is in development now.