Open CHDev93 opened 3 months ago
Other things of note
@profile
decorators when running with mprof (no need to import anything) to get the the little colored function markersOK. Can confirm that we see this performance regression with the latest dev version of polars.
Latest seems to perform significantly worse both in memory but run time is comparable.
Ran the following script with @profile
enabled. Had to remove coalesce=True
in line 17 since this is not supported in 0.19.19.
@adamreeve
#dev version: 1.12.0
source .venv/bin/activate
make build-dist-release
mprof run --python -o polars_1120_small.prof -M --include-children python polars_join_bug_mwe.py
# Augment index: 0.684s
# Process data: 5.261s
mprof plot polars_1120_small.prof --title polars_1_12_0 -o polars_1_12_0_small.png -w 0,12
# Compare with 0.19.19
pip uninstall polars
pip install --force-reinstall -v "polars==0.19.19"
mprof run --python -o polars_1919_small.prof -M --include-children python polars_join_bug_mwe.py
# Augment index: 0.859s
# Process data: 5.787s
mprof plot polars_1919_small.prof --title polars_0_19_19 -o polars_0_19_19_small.png -w 0,12
I cannot reproduce. First bump is 1.14, second bump is 0.19.19
Performance difference:
1.14: 4.899s
0.19.19: 5.804s
@corwinjoy the plot for 0.19.19 shows process_data
being essentially instant. Are you sure the same thing is being run in both cases? The graph for polars 1.12.0 looks far too slow compared with 0.19.19
@CHDev93 The same script was being run in both cases, but I agree that the time for process_data
was odd from mprof. I think the sleep command was throwing off the timings there. Re-running 0.19.19 with the sleep commands removed I get more reasonable timings out. However, the memory issue remains as discussed on discord. It looks like @ritchie46 has some ideas there.
The mprof command outputs (with sleep removed using 0.19.19):
Augment index: 0.879s
Process data: 3.900s
``
Yes, we now go into the row-encodign for the group-by. The new algorithm is faster when data doesn't fit in your cache size anymore. You must increase the dataset (depending on the beefyness of your machine) to see the result. Latest Polars is 1.3-1.5x faster for me, but does indeed require more memory (1.5).
The old code had to go though. The memory requirements will improve with the new-streaming engine and with fixed size row-encoding which we plan to add as well.
In any case, it isn't a bug but the cost of our new algorithm. We have to be able to remove old code branches if they hurt us and sometimes this has a different memory footprint.
Checks
Reproducible example
I installed memo
Log output
Issue description
I'm doing a join of two tables on a compound key of [int, int, int]. In newer versions of polars (inlcuding
polars==1.4.1
) it uses much more memory than I'd expect. I confirmed by rolling back topolars==0.19.19
and found it did use significantly less memory.Expected behavior
I'd expect a left join for this problem to use something like 2x the space of the left table as it did in
0.19.19
. I ran the script with python'smemory-profiler
package and used the commandsRunning:
mprof run --python -o polars_141_small.prof -M --include-children python polars_join_bug_mwe.py
Plotting:mprof plot polars_141_small.prof --title polars_1_4_1 -o polars_141_small.png -w 0,12
Installed versions