joke2k / faker

Faker is a Python package that generates fake data for you.
https://faker.readthedocs.io
MIT License
17.57k stars 1.92k forks source link

Distribution of `pydecimal` is very far from optimal #2090

Open sshishov opened 2 weeks ago

sshishov commented 2 weeks ago

Distribution of pydecimal is very far from optimal which can lead to difficulty of use it in the tests. For instance, it the initial value is max_value and the updated value is also max_value then it will "break" the test because the value will not be updated.

I can recommend the following approaches (imho):

Steps to reproduce

import faker
import collections
import decimal as dec

fake = faker.Faker()

counter = collections.Counter(fake.pydecimal(left_digits=0, right_digits=4, min_value=dec.Decimal('0.1'), max_value=1) for item in range(1000000))
for value, count in counter.most_common(10):
    print(value, ':', count)

Expected behavior

0.1437 : 76
0.3199 : 76
0.2477 : 75
0.7345 : 75
0.1284 : 74
0.6271 : 74
0.1597 : 74
0.4462 : 74
0.6293 : 74
0.4967 : 74

Actual behavior

1 : 500105
0.1 : 50284
0.1437 : 76
0.3199 : 76
0.2477 : 75
0.7345 : 75
0.1284 : 74
0.6271 : 74
0.1597 : 74
0.4462 : 74
sshishov commented 2 weeks ago

This is how we are handling it for our tests:

def get_value() -> dec.Decimal:
    """Generates real fake decimal by eliminating `min_value` and `max_value` value which is returned in case of underflow/overflow."""
    return next(
        item
        for item in iter(
            lambda: fake['en'].pydecimal(
                left_digits=0,
                right_digits=4,
                min_value=dec.Decimal('0.0001'),
                max_value=dec.Decimal(1),
            ),
            None,
        )
        if item not in {dec.Decimal('0.0001'), dec.Decimal(1)}
    )