Open galipremsagar opened 1 month ago
I had a look at this. It boils down to:
x = 9180.52952127610660682
y = 91.8052952127610666366
floordiv = x // y
print(f"{floordiv:.20f}") # => 99.00000000000000000000
#include <cmath>
#include <iostream>
#include <iomanip>
int main() {
double x = 9180.52952127610660682;
double y = 91.8052952127610666366;
double result = std::floor(x / y);
std::cout << std::fixed << std::setprecision(20);
std::cout << result << std::endl;
}
// => 100.00000000000000000000
The libcudf implementation is (matching the C++ I wrote):
$$ q_\text{floor} := \left\lfloor \frac{x}{y} \right\rfloor $$
The pandas/numpy/python implementation uses divmod
, which finds a float $q{\text{divmod}}$ such that for a pair $x$ and $y$, $q{\text{divmod}} y + (x \mod y) \approx x$. Usually this is the same as $q_\text{floor}$ but is sometimes one less, this bug is one such example.
In particular, note that since $y$ in this example is slightly greater than $x / 100$ (if computed exactly), then $x \mod y$ is only slightly less than $y$ (and not close to zero), which means that $q_\text{floor} y + (x \mod y) = x + y$.
If we don't care about preserving the sign of zero, we could implement this as:
template<typename T>
auto py_floor_div(T x, T y) -> decltype(x / y) {
auto const mod = std::fmod(x, y);
auto div = (x - mod) / y;
// fixup for mixed signs
div -= (mod != 0 && (std::signbit(x) ^ std::signbit(y)));
auto const floordiv = std::floor(div);
// fixup
return floordiv + (div - floordiv > 0.5);
}
I haven't checked spark. cudf.polars wants the libcudf implementation.
A previous issue just for reference: https://github.com/rapidsai/cudf/issues/12120 We try to keep libcudf consistent with std C++ library behavior when possible.
We try to keep libcudf consistent with std C++ library behavior when possible.
Just to note, we already have carveouts for different modulo behaviour in the binop implementation:
MOD, ///< operator %
PMOD, ///< positive modulo operator
///< If remainder is negative, this returns (remainder + divisor) % divisor
///< else, it returns (dividend % divisor)
PYMOD, ///< operator % but following Python's sign rules for negatives
If we don't want to incorporate all the additional operations that might be required (for example, I would like an implementation of GREATER/LESS/etc..
for floating point types that uses the IEEE total order on floats in the presence of NaN values), this seems like a good place to consider the ongoing jitify/nvrtc improvements (cc @lamarrr): we have binary_operation(..., std::string const& ptx, ...)
but that looks to be quite numba-specific right now.
Describe the bug When a
floordiv
is performed in libcudf, it seems that there is an off-by-1 error.Steps/Code to reproduce bug pickle files: Archive.zip
Expected behavior Match pandas
Environment overview (please complete the following information)
Environment details Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context Add any other context about the problem here.