Open AtollRewe opened 3 months ago
This will always be slower for a single element dict lookup. We have to go to Rust and do checks on that side.
However, there is still much to gain. Our implementation is rather naive and does some checks needlessly (e.g. coming from a dict already has uniqueness guarantees). Will look at this a bit later.
I understand that a simple Python dict lookup would always be faster than involving Polars in some way. That's why I suggest to add a warning about this case to the documentation of replace
and replace_strict
and/or offer some kind of pre-computed mapping (i.e. what you describe as "go to Rust and do checks").
To elaborate my use case: I first need to run some data processing on a large DataFrame once. After that, I need to run the same data processing on single-row DataFrames several times. So what I'm doing now as a workaround is a branch to use replace
for DataFrames with more than 10 elements and map_elements
otherwise but that's just very hacky. So it'd be great if there was some way to provide a uniform implementation that can be used efficiently for both cases.
Checks
Reproducible example
Log output
Issue description
In a use case of wanting to replace a small number of data points using a large replacement mapping, the recommended
replace
(orreplace_strict
) becomes extremely slow; about 2000x slower than usingmap_elements
with a dict lookup.Expected behavior
I'm not sure if one could expect a naive as in my example to always be fast. I assume there is a notable overhead because the Python dict needs to be converted into some kind of Rust structure. But I feel like there should be some way to define an efficient Polars-native mapping object (that's represented by a HashMap or something similar internally), that could be pre-computed once and then reused to perform fast mappings with
replace
.At the very least, imo there should be a warning in the documentation of
replace
, like there is formap_elements
, warning of this use case and directing the user to another function.Installed versions