Open richardxia opened 8 months ago
@richardxia thanks for the writeup! I haven't had a chance to look into it in detail yet, but I'll do that soon :)
Also, feel free to change the API if it's necessary. Or add a new function to handle these cases. For example, you mentioned passes_pydantic_multiple_validator
- you can change that to handle floats and Decimal types differently if that'll make the implementation easer.
@richardxia sorry for the delay, but I've had a chance to look into the PR.
So I think the biggest issues that we're facing here is the opaque nature of method
in generate_constrained_number
. I am comfortable with deprecating that and then using dedicated functions for each numerical type i.e. generate_constrained_integer
, generate_constrained_float
etc. These would have no need to take in a method
but could decide what method to create the constrained number depending on the constraints and other context. Furthermore, each of them could determine whether the generated number passes the constraints however they wish without relying on pydantic_pydantic_multiple_validator
(this is a faulty implementation in pydantic as well as seen in this issue though I think that's only for pydantic v1). WDYT?
However, these would have to be deterministic since users have to be able to reliably create the same instances with the same values if needed (based on the seeding of the Random
instance on the factory).
I think we have room for making whatever changes we need here, and deprecate accordingly, without too many issues since most, if not all, users wouldn't be using these functions but only the APIs directly related to the factories.
Also, I think we should be fine as long as reasonable scenarios are working, If hypothesis keeps coming up with extreme cases that won't work, then we can just restrict hypothesis or something along those lines and just document that such extreme cases may fail.
@guacs, thanks for taking the time to review my PR and leaving comments. What you said sounds good to me, in terms of deprecating the method
argument to generator_constrained_number()
. I think I'll probably be busy for the rest of the work week, but I may have some time over the weekend to revisit this and hopefully come up with a simpler change.
I pushed a completely rewritten solution to this branch, but it looks like I'm currently failing the Python 3.8 tests because I accidentally used a function that was only implemented in Python 3.9. I'll have to revisit this tomorrow.
In the meantime, please let me know if the new changes I made look like they're in the right direction. I made some breaking changes to the functions in constrained_numbers.py
. I wasn't sure if it was important to keep these functions backwards-compatible, but let me know if it is, and I can think about how to do that. get_increment()
will be tricky to preserve backwards compatibility, since it's technically impossible for that function to do its job correctly with its current function signature.
Documentation preview will be available shortly at https://litestar-org.github.io/polyfactory-docs-preview/505
Pull Request Checklist
Description
Library Code Changes
Previously,
generate_constrained_number()
would potentially generate invalid numbers whenmultiple_of
is not None and exactly one of eitherminimum
ormaximum
is not None, since it would just returnmultiple_of
without respecting the upper or lower bound.This significantly changes the implementation of the code to correctly handle this code. The
generate_constrained_number()
method has been completely removed, being replaced with agenerate_constrained_multiple_of()
function. A major difference between the old function and the new function is that the new one does not accept amethod
argument for generating random numbers. This is because in the new function, we always usecreate_random_integer()
, since the problem reduces to generating a random integer multiplier.The high-level algorithm behind
generate_constrained_multiple_of()
is that we need to constrain the random integer generator to generate numbers such that when they are multiplied withmultiple_of
, they still fit within the original bounds constraints. This simplify involves dividing the original bounds bymultiple_of
, with some special handling for negativemultiple_of
numbers as well as carefully chosen rounding behavior.We also need to make some changes to other functions.
get_increment()
needs to take an additional argument for the actual value that the increment is for. This is because floating-point numbers can't use a static increment or else it might get rounded away if the numbers are too large. Python fortunately provides amath.ulp()
function for computing this for a given float value, so we make use of that function. We still use the originalfloat_info.epsilon
constant as a lower bound on the increment, though, since in the case that the value is too close to zero, we still need to make sure that the increment doesn't disappear when used against other numbers.Finally, we rename and modify
passes_pydantic_multiple_validator()
tois_almost_multiple_of()
, modifying its implementation to defer the casting of values tofloat()
to minimize rounding errors. This specifically affects Decimal numbers, where casting to float too early causes too much loss of precision.Test Changes
A significant number of changes were made to the tests as well, since the original tests missed the bug being fixed here. Each of the integer, floating-point, and decimal tests has been updated to assert that the result is actually within the minimum and maximum constraints. In addition, we remove some unnecessary sorting of the randomly generated test input values, since this was unnecessarily constraining
multiple_of
to be greater than or less than the minimum and maximum values. This was causing a lot of the scenarios involving negative values to be skipped.Lastly, the floating-point and decimal tests need additional constraints to avoid unrealistic extreme values from hitting precision issues. This was done by adding a number of constraints on the number of significant digits in the input numbers and on the relative magnitudes of the input numbers.
Close Issue(s)
503