Closed pchalasani closed 10 months ago
Hmm if I could I would add a flatten
parameter to to_pandas()
, but we don't control that method (it's in PyArrow).
Other DataFrames do have decent support for nested columns, such as Polars. So I don't think flattening in general is what we want.
Perhaps we can provide a helpful snippet to teach them how to unflatten a column? IIRC it's just something like:
df.assign(nested = lambda df: [x['key'] for x in df['struct']])
@pchalasani i think we can do this in LanceDB repo instead of the format level (please see the referencing PR)
Nice, thanks, I seem to be conflating the two repos in my mind 😀
For future reference for Lance users, you can write:
dataset.to_table(...).flatten().to_pandas()
If you have multiple levels of nested fields, you may need to call flatten()
multiple times.
Maybe I can make this a tip in the user guide?
Yes an example in the user guide would help, thanks On Dec 20, 2023 at 12:00 PM -0500, Will Jones @.***>, wrote:
For future reference for Lance users, you can write: dataset.to_table(...).flatten().to_pandas() If you have multiple levels of nested fields, you may need to call flatten() multiple times. Maybe I can make this a tip in the user guide? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
In the
where
clause, the SQL query can access arbitrarily nested fields, but this is not possible with the dataframe, e.g. I want to be able to do:and be able to see all nested fields of the schema as top-level columns in
df
. Among other things this would enablepandas
queries likedf.query(...)