pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.12k stars 1.83k forks source link

Support for Map DataType #8385

Open chitralverma opened 1 year ago

chitralverma commented 1 year ago

Problem description

Most data processing systems/ data frame libs have a non-strict MapType (dict/ HashMap), are any plans to support this in Polars (rust/ py) as well?

Ref arrow type: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Map https://arrow.apache.org/docs/python/generated/pyarrow.map_.html#pyarrow.map_

ritchie46 commented 1 year ago

I don't think it is worth the extra code bloat and complexity. It is a List<struct<2>> physically, so that's how we read them in polars.

I think FixedSizeList and Decimal have much, much higher prio.

chitralverma commented 1 year ago

I don't think it is worth the extra code bloat and complexity. It is a List<struct<2>> physically, so that's how we read them in polars.

I think FixedSizeList and Decimal have much, much higher prio.

Sure. just added a tracking ticket for now or later.

with a List<struct<2>> the user will lose the random key look up right since lists are sequential?

john-sungjin commented 8 months ago

I was wondering if there were any plans to implement maps, now that fixedsizelist and decimal have been added. I have some dicts with variable keys that I'd like to work with; is there a simple workaround with the List<struct<2>> type?

theelderbeever commented 1 month ago

I don't think it is worth the extra code bloat and complexity. It is a List<struct<2>> physically, so that's how we read them in polars.

I think FixedSizeList and Decimal have much, much higher prio.

@ritchie46 Would the Map/List<struct<2>> type solve issues like #10234? I run into the empty struct issue quite regularly when dealing with api response data (which polars has been a game changer for dealing with) however, if any empty dictionaries are in the response payload polars outright fails to parse the data into a dataframe. And when responses a deeply nested it can be pretty untenable to attempt to remove all empty dictionaries.