Open abstractqqq opened 3 weeks ago
Not that easy imo.
Integer + Float -> Float
This should be the case no matter what the Float value is. We should not have a special case where Integer + Float<0.0> -> Integer
Integer + 0.0 is therefore equal to "Cast Integer to Float" which is no noop afaik because the underlying bits change in the number representation 🤔
We have rule for literal(a) + literal(b) -> literal(a + b).
I guess it should not be that hard, but maybe I miss something. I will be glad to look at it after I will finish my current pull request (only tests and CR are left).
Not that easy imo.
Integer + Float -> Float
This. It isn't a no-op. It is a cast. These timings should not be the same if you multiply an integer with a float.
@abstractqqq can you share your complete example?
Not that easy imo.
Integer + Float -> Float
This. It isn't a no-op. It is a cast. These timings should not be the same if you multiply an integer with a float.
@abstractqqq can you share your complete example?
import numpy as np
import polars as pl
df = pl.DataFrame({
"a": np.random.random(size = 1000)
})
%timeit df.select(pl.col("a"))
%timeit df.select(pl.col("a") + 0.)
Checks
Reproducible example
This is not urgent at all. Just a curious observation.
Not sure if this is feasible to be honest, because we don't technically know the type of variables..
But in the case when we know the type, or in the case when we are adding a column with a literal, the optimizer should notice that (pl.col("x1") + 0.) is the same as just pl.col("x1")
12 µs ± 37.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 25.2 µs ± 238 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
The situation occurs when 0 is the default value for some more complicated expressions. Say, in a linear regression with an intercept (bias term), we do y = b0 + x1 b1 + x2 b2 + x3 * b3. Typically we set b0 = 0 if we don't wish to fit the intercept. We can always check whether b0 is 0.. But that is more code.
I am hoping that the optimizer can optimize it away when we are sure that we are adding a literal 0 to some numerical column.
Same goes with a literal 1 * some numerical column.
Log output
No response
Issue description
See above
Expected behavior
Optimize should recognize the following cases as no-op:
literal 0 + number column literal 1 * number column
The question is whether we know the column is of numerical type at the time when the optimizer looks at the expression. My suspicion is that we don't...
Installed versions