green-code-initiative / ecoCode-challenge

Emboard in the hackhatons serie for improving ecoCode
3 stars 4 forks source link

[Hackathon 2024][Indium][Python] Avoid basic REGEX usages #119

Open max-perrin opened 1 month ago

max-perrin commented 1 month ago

Rule title

Avoid basic REGEX usages.

Language and platform

PoC made in Python, but can be applied the same way to PHP and Java too.

Rule description

Using regex methods for basic string manipulations is not time efficient. Prefer the usage of string methods such as startswith, endswith, or in operator, which are faster.

Noncompliant Code Example

string = 'abcdef'
if re.search(r'^abc', string):
    print('string starts with abc')

Compliant Solution

string = 'abcdef'
if string.startswith('abc'):
    print('string starts with abc')

Rule short description

Avoid using REGEX for basic string manipulation.

Rule justification

We measured the execution time using the time module in Python. The resource used was a 1.1 million word list found online. To obtain representative results, tests were performed several times.

Noncompliant Code

prefix = 'te'
regex = re.compile(fr'^{prefix}')
with open('1.1million word list.txt', 'r', encoding='utf-8') as file:
    count = 0
    for word in file:
        if regex.search(word) is not None:
            count += 1

Compliant Code

prefix = 'te'
with open('1.1million word list.txt', 'r', encoding='utf-8') as file:
    count = 0
    for word in file:
        if word.startswith(prefix):
            count += 1

We search the 1.1 million word list to find strings starting with 'te', using regex.search or string.startswith. The test was done 5 time for each method, giving the results below:

using regex search

N° of iteration Time (ms)
1 560.47
2 670.64
3 675.09
4 831.12
5 517.98
Average 651.06

using string startswith

N° of iteration Time (ms)
1 318.57
2 259.16
3 287.76
4 253.95
5 245.64
Average 273.02

Conclusion: for this test session, using string manipulation was on average 2.4x faster than using regex. More tests should be done to study the energy consuption, but it should be proportionate to the execution time.

Severity / Remediation Cost

Estimate the severity and remediation cost of your issue.

Implementation principle