joke2k / faker

Faker is a Python package that generates fake data for you.
https://faker.readthedocs.io
MIT License
17.41k stars 1.9k forks source link

`fake.pystr()` returns unexpected length when using `max_chars` with `prefix` | `suffix` #1877

Closed themattmorris closed 1 year ago

themattmorris commented 1 year ago

Love this library, thanks for the hard work!

I came across a situation today where strings returned by fake.pystr(...) are not what I would have expected. This line adds the prefix and/or suffix to a string with length already established according to max_chars/min_chars. Because of this, the resulting string is longer than expected when max_chars is used in conjunction with either prefix or suffix.

If you're open to a PR, I'd be willing to submit one to update this. Just lemme know...thanks again!

Steps to reproduce

from faker import Faker

fake = Faker()
Faker.seed()

fake.pystr(max_chars=3, prefix="a")
fake.pystr(max_chars=3, suffix="z")
fake.pystr(max_chars=3, prefix="a", suffix="z")

Expected behavior

aRu
TTz
aFz

Actual behavior

aRuB
VTTz
aFmDz
fcurella commented 1 year ago

Thank you for the report! Yes, a Pull Request would be much appreciated!

stefan6419846 commented 1 year ago

I am not sure if this should be considered a documentation issue or bug. The prefix and suffix implementation has been added by me and follows the same convention regarding the length as factory_boy.fuzzy.FuzzyText (which is deprecated and I needed a replacement for). https://github.com/FactoryBoy/factory_boy/blob/53b34d504fad05dd1887a7eb0056ce9516fd4646/factory/fuzzy.py#L59 just has better documentation on where the length limit actually apply to.

The tests actually ensure that the prefix and suffix indeed do not count towards the specified limits: https://github.com/joke2k/faker/blob/d193bba77917f28504a036d1c11e0c517cc61373/tests/providers/test_python.py#L474

In my opinion, updating the documentation here to point out the the length limits are only for the random part should be easier, as we would have additionally ensure the prefix and suffix actually allow for the requested length otherwise.

themattmorris commented 1 year ago

Thanks for the feedback and for the reference in the unit tests!

You are correct that changing this would require an additional check to ensure that max_chars >= len(prefix) + len(suffix). There is already a check for relationship between max_chars and min_chars:

https://github.com/joke2k/faker/blob/d193bba77917f28504a036d1c11e0c517cc61373/faker/providers/python/__init__.py#L119

so I considered adding this check to be trivial.

However, I didn't realize there was a precedent set for this. I find it counterintuitive that adding prefix and/or suffix will cause the returned string to be longer than max_chars, but maybe it's just me, I'd be curious to know what others think. I just workaround it right now by specifying a max_chars that takes into account the length of the prefix and/or suffix. I recognize updating it would be a breaking change.

I can submit a PR for either situation (docs or update to logic). Any thoughts on this @fcurella ?

fcurella commented 1 year ago

I can see how one would expect prefix/suffixes to be included in the count, but in this case I’d rather keep it consistent with other tools in the ecosystem. I’d accept a PR on the docs clarifying the behavior :)

On Fri, Jun 16, 2023 at 8:33 AM Matt Morris @.***> wrote:

Thanks for the feedback and for the reference in the unit tests!

You are correct that changing this would require an additional check to ensure that max_chars >= len(prefix) + len(suffix). There is already a check for relationship between max_chars and min_chars:

https://github.com/joke2k/faker/blob/d193bba77917f28504a036d1c11e0c517cc61373/faker/providers/python/__init__.py#L119

so I considered adding this check to be trivial.

However, I didn't realize there was a precedent set for this. I find it counterintuitive that adding prefix and/or suffix will cause the returned string to be longer than max_chars, but maybe it's just me, I'd be curious to know what others think. I just workaround it right now by specifying a max_chars that takes into account the length of the prefix and/or suffix. I recognize updating it would be a breaking change.

I can submit a PR either way. Any thoughts on this @fcurella https://github.com/fcurella ?

— Reply to this email directly, view it on GitHub https://github.com/joke2k/faker/issues/1877#issuecomment-1594685535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAV4B5UZ3XNCLUPUPMHVNTXLROEBANCNFSM6AAAAAAZIMXP54 . You are receiving this because you were mentioned.Message ID: @.***>