Open SeBii91 opened 1 year ago
OS | OS version | Langage |
---|---|---|
- | - | Python 3 |
ID | Title | Category | Sub-category |
---|---|---|---|
EOPT001 | Use __slots__ in class definitions |
Environment | Optimized API |
@dataclass
decoratorSeverity | Remediation Cost |
---|---|
Minor | Minor |
@dataclass
decoratorSeverity | Remediation Cost |
---|---|
Minor | Minor |
Case 1: You should declare class slots by using __slots__
when defining a class in Python to explicitely declare data members and use way less memory than the default behavior based on __dict__
and __weakref__
attributes.
Case 2: When you have a class annoted with @dataclass
, you should set the slots
parameter to true
to use way less memory than the standard implementation based on based on __dict__
and __weakref__
attributes.
Python is an interpreted, high-level, general-purpose programming language. One of its main advantages is its flexibility and dynamic nature. However, this flexibility comes at a cost in terms of performance, especially in memory usage. One way to optimize memory usage is by using the __slots__
attribute in class definitions.
By default, when we define a class in Python, it uses a __dict__
to store its attributes and methods. This means that every instance of the class has a dictionary to store its data, which can be quite memory-intensive.
By using __slots__
, we can explicitly declare the data members of the class, allowing Python to allocate memory for each instance of the class more efficiently. When we use __slots__
, Python creates a tuple to store the attributes of the class instead of a dictionary, reducing memory usage.
This optimization is especially important for classes that are going to be instantiated multiple times or have a large number of instances, as it can significantly reduce memory consumption. It is also useful when working with large datasets or building memory-intensive applications.
In the case of classes with the @dataclass
decorator, which automatically generates boilerplate code for classes, we can also use __slots__
to optimize memory usage. By default, @dataclass
uses the same __dict__
approach for attribute storage. However, by setting the slots
parameter to True
, we can use the tuple-based attribute storage and reduce memory usage.
Here is the default implementation of the @dataclass
decorator:
@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)
Using __slots__
can be an effective way to optimize memory usage in Python classes and improve the performance of memory-intensive applications. It is a minor remediation cost to add __slots__
in class definitions and can have significant benefits in terms of memory efficiency.
# Non-compliant
class MyClass:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
# Compliant
class MySlottedClass:
__slots__ = ('a', 'b', 'c')
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
# Compliant
@dataclass(slots=True)
class MyDataClassSlottedClass:
a: int
b: int
c: int
<p>
Python is an interpreted, high-level, general-purpose programming language. One of its main advantages is its flexibility and dynamic nature. However, this flexibility comes at a cost in terms of performance, especially in memory usage. One way to optimize memory usage is by using the <code>__slots__</code> attribute in class definitions.
</p>
<p>
By default, when we define a class in Python, it uses a <code>__dict__</code> to store its attributes and methods. This means that every instance of the class has a <strong>dictionary</strong> to store its data, which can be quite memory-intensive.
</p>
<p>
By using <code>__slots__</code>, we can explicitly declare the data members of the class, allowing Python to allocate memory for each instance of the class more efficiently. When we use <code>__slots__</code>, Python creates a tuple to store the attributes of the class instead of a dictionary, reducing memory usage.
</p>
<p>
This optimization is especially important for classes that are going to be instantiated multiple times or have a large number of instances, as it can significantly reduce memory consumption. It is also useful when working with large datasets or building memory-intensive applications.
</p>
<p>
In the case of classes with the <code>@dataclass</code> decorator, which automatically generates boilerplate code for classes, we can also use <code>__slots__</code> to optimize memory usage. By default, <code>@dataclass</code> uses the same <code>__dict__</code> approach for attribute storage. However, by setting the <code>slots</code> parameter to <code>True</code>, we can use the tuple-based attribute storage and reduce memory usage.
</p>
<p>
Here is the default implementation of the <code>@dataclass</code> decorator:
</p>
<pre>
<code>
@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)
</code>
</pre>
<p>
Using <code>__slots__</code> can be an effective way to optimize memory usage in Python classes and improve the performance of memory-intensive applications. It is a minor remediation cost to add <code>__slots__</code> in class definitions and can have significant benefits in terms of memory efficiency.
</p>
<pre>
<code>
# Non-compliant
class MyClass:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
# Compliant
class MySlottedClass:
__slots__ = ('a', 'b', 'c')
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
# Compliant
@dataclass(slots=True)
class MyDataClassSlottedClass:
a: int
b: int
c: int
</code>
</pre>
Iterate over the AST and identify all class definitions.
For each class definition, check if it has the __slots__
attribute defined. If it does not, flag it as a violation of the rule.
For classes decorated with @dataclass
, check if the slots
parameter is set to True
. If it is not, flag it as a violation of the rule.
For each violation, generate an appropriate warning message.
import sys
import time
from dataclasses import dataclass
# Non-compliant
class MyClass:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
# Compliant
class MySlottedClass:
__slots__ = ('a', 'b', 'c')
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
# Compliant
@dataclass(slots=True)
class MyDataClassSlottedClass:
a: int
b: int
c: int
def run_instanciations(n, classType):
for i in range(n):
classType(i, i+1, i+2)
if __name__ == '__main__':
n = 50000000
# Get the version from the command line
version = sys.argv[1]
# Switch on the version
if version == 'v1':
classType = MyClass
elif version == 'v2':
classType = MySlottedClass
elif version == 'v3':
classType = MyDataClassSlottedClass
else:
raise ValueError(f'Unknown version {version}')
# Run the instanciations
start = time.time()
run_instanciations(n, classType)
end = time.time()
# Print the time
print(f'Ran in {end-start:.2f} seconds with {n} instanciations')
The results are as follows:
vincent@vincent-HP-ProBook-450-G7:~/dev/poub$ vjoule --no-gpu python3 joules_slots.py v1 # Non compliant class
Ran in 15.33 seconds with 50000000 instanciations
CGroup | CPU | RAM |
---|---|---|
Global | 230.80J | 13.23J |
Process | 136.08J | 0.14J |
vincent@vincent-HP-ProBook-450-G7:~/dev/poub$ vjoule --no-gpu python3 joules_slots.py v2 # Compliant class with slots
Ran in 12.01 seconds with 50000000 instanciations
CGroup | CPU | RAM |
---|---|---|
Global | 176.98J | 10.25J |
Process | 107.34J | 0.11J |
vincent@vincent-HP-ProBook-450-G7:~/dev/poub$ vjoule --no-gpu python3 joules_slots.py v3 # Compliant dataclass with slots
Ran in 11.71 seconds with 50000000 instanciations
CGroup | CPU | RAM |
---|---|---|
Global | 174.60J | 10.19J |
Process | 102.00J | 0.10J |
This rule is in development now.
Extract from https://wiki.python.org/moin/UsingSlots
The slots declaration allows us to explicitly declare data members, causes Python to reserve space for them in memory, and prevents the creation of dict and weakref attributes. It also prevents the creation of any variables that aren't declared in slots.
Why Use
__slots__
? The short answer is slots are more efficient in terms of memory space and speed of access, and a bit safer than the default Python method of data access.