Open wolliq opened 1 month ago
Thanks for the issue. This would definitely be good to support.
Hey, can I take this up? I assume I would need to support just polars.datatypes.FLOAT_DTYPES and polars.datatypes.INTEGER_DTYPES inside the List right?
I have made a draft pull request, would appreciate any comments :) If you think I am in the right direction, I can work on test cases and other functionalities associated with this feature.
Hey, can I take this up? I assume I would need to support just polars.datatypes.FLOAT_DTYPES and polars.datatypes.INTEGER_DTYPES inside the List right?
Sure! Lists can contain anything though (also strings, decimals, ...). So it's not just constrained to floats/integers.
Got it, I'm working on it. I have doubt regarding wrap around for string representation of polars dataframe, the column data is wrapping around as follows:
shape: (2, 3)
┌─────────────────────────────────┬─────────────────────────────────┬─────────────────────────────────┐
│ f ┆ g ┆ h │
│ --- ┆ --- ┆ --- │
│ list[date] ┆ list[time] ┆ list[datetime[ns]] │
╞═════════════════════════════════╪═════════════════════════════════╪═════════════════════════════════╡
│ [2022-07-05, 2023-02-05, 2023-… ┆ [00:00:00.000001, 12:30:45, 23… ┆ [2022-07-05 10:30:45.004560, 2… │
│ [2022-07-05, 2023-02-05, 2023-… ┆ [00:00:00.000001, 12:30:45, 23… ┆ [2022-07-05 10:30:45.004560, 2… │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────┘
Due to this, the data is truncated, any suggestion on how I can handle this?
Due to this, the data is truncated, any suggestion on how I can handle this?
The reasonable thing to do is load only the whole/valid data; truncated columns (when a frame has more cols than can be displayed) are similarly dropped. There is, after all, no way (at all) to reconstruct the truncated values, so...
Got it, thanks! I have raised a pull request, could you please review and let me know if there are any suggestions?
Description
In many ML/NLP use cases it's useful to have the reading from_repr feature supporting list type so that reading from a feature store where numerical representation are stored, e.g. embeddings vectors for unit testing. Today if we run
we have
Thanks