pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.26k stars 1.85k forks source link

Add conversion from `binary` #9298

Open wKollendorf opened 1 year ago

wKollendorf commented 1 year ago

Problem description

Hello,

I am working much with binary data and it would make things easier, when conversion from binary to at least all the common numeric dtypes would be possible. Here is an example:

import polars as pl

data = {"test": [b'\xFD\x00\xFE\x00',b'\x10\x00\x20\x00']}
schema = {"test": pl.Binary}

df = pl.DataFrame(data, schema)

df = df.with_columns([
      pl.col('test').cast(pl.Int32)
   ])

Recently @ritchie46 already added #9161 binary -> List(u8). Additional parameter for endianness would also be neccessary.

Thanks!

wKollendorf commented 1 year ago

At least converting list<u8> to f64 is provided by from_le_bytes or from_be_bytes

joshualeond commented 1 month ago

I'm new to Polars but was wondering about this use case. Given that it's not built-in, would there be a performant way to use map_batches and something like np.frombuffer? I'm unsure how to provide additional arguments in map_batches.