Open avhz opened 2 weeks ago
just use
map = { "a": "x", "b": "y", "c": "z", }
df.with_columns( polars.col("A").replace(map) )
Good suggestion - we can do the same 'trick' as we do in replace
.
Is there a way that we can make this work with regex ?
I have tried something like:
import regex
import polars
df = polars.DataFrame({
"x": ["a", "b", "c", "1", "2", "3"],
})
map = {
regex.compile(r"[a-z]"): "alpha",
regex.compile(r"[0-9]"): "digit",
}
df.with_columns(
polars.col("x").replace(map)
)
For my personal use case, I need to replace a large number of regex patterns, and it's not very ergonomic to use two lists because it can be hard to keep track of what is replacing what.
Another possibility is a list of tuples:
map = [
(r"[a-z]", "alpha"),
(r"[0-9]", "digit"),
]
This (in my opinion) is nicer to follow than something like:
old = [r"[a-z]", r"[0-9]"]
new = ["alpha", "digit"]
A dict is just going to be syntactic sugar. You can just define your map and then call str.replace_many(map.keys(), map.values())
.
That gives me:
TypeError: cannot create expression literal for value of type dict_keys:
...
Hint: Pass `allow_object=True` to accept any value and create a literal of type Object.
Call list
on each input then. I'm just saying: this is just some syntactic sugar that you can do yourself. You don't need us to take care of it. Though it would be nice if we did.
I must be missing something, as none of the following work for me:
## ============================================================================
import regex
import polars
## ============================================================================
df = polars.DataFrame(
{
"x": ["a", "b", "c", "1", "2", "3"],
}
)
## ============================================================================
map = {
regex.compile(r"[a-z]"): "alpha",
regex.compile(r"[0-9]"): "digit",
}
df.with_columns(polars.col("x").replace(map))
df.with_columns(polars.col("x").replace(map.keys(), map.values()))
df.with_columns(polars.col("x").replace(list(map.keys()), list(map.values())))
## ============================================================================
map = {
r"[a-z]": "alpha",
r"[0-9]": "digit",
}
df.with_columns(polars.col("x").str.replace_many(map))
df.with_columns(polars.col("x").str.replace_many(map.keys(), map.values()))
df.with_columns(polars.col("x").str.replace_many(list(map.keys()), list(map.values())))
## ============================================================================
All throw an exception except the third (when calling list()
), which does not throw an exception, but also does not match the regex pattern, so I am left with the original dataframe.
There are a few different issues:
regex.compile()
objects - which Polars does not understand.Polars uses the Rust crate https://github.com/rust-lang/regex - so you must pass "strings" when using the regex functions.
.str.replace_many
does not work with regular expressions. (perhaps a note could be added to the docs?)It uses https://github.com/BurntSushi/aho-corasick which works with "literal strings" only.
It sounds like you may really be asking for:
Description
Currently the
str.replace_many
method takes two lists for the original and replacement strings.It would be handy to include the ability to just pass a dictionary which defines the replacement mapping: