Closed dataning closed 1 year ago
With the same dataframe output:
pl.read_csv().columns
- I can get the column namestp.read_csv().columns
- I cannot get the column names
You can extract column names using tidypolars_df.names
(much like using names(df)
in R).
However, I cannot apply the mutate() directly to a polars dataframe if it's not directly coming through tidypolars
You can convert data to/from polars DataFrames with tp.from_polars()
/.to_polars()
:
# Option 1
tp.from_polars(polars_df).mutate().to_polars().agg()
# Option 2 (using `.pipe()` method)
polars_df.pipe(tp.from_polars).mutate().to_polars().agg()
But regarding the bigger overall question
Short answer I think I'm going to build this functionality in a way (see #208).
Long answer
In python methods (accessed using .method()
syntax) belong to specific class. .mutate()
is built for the Tibble class, and therefore won't work on polars DataFrames. Just like polars .with_columns()
won't work on a pandas DataFrame or a tidypolars Tibble. It is impossible for me to add a method to the polars DataFrame class since I don't own that code (there is technically a hacky way but it is almost guaranteed to break internal polars code). That's why the Tibble class is necessary.
This is one massive disadvantage of building tools in python that try to extend functionality of an existing data frame library. Python's object-oriented structure causes this limitation. In R all data frame libraries (dplyr, data.table) are built on top of of R's base data.frame
class. And functions can be made that operate differently depending on the type of object that is fed into it. This is what the S3 object oriented system allows. It's also more-or-less what the Julia language implements for its OOP system.
Even the solution proposed in #208 is sort of hacky, but it will allow people to work directly on polars DataFrames.
If you have any further questions or need something clarified feel free to ask in this issue.
Big fan of your tidypolars, esepcially for people coding both in tidyverse and polars.
I am trying to figure out a solution where I want to use tidypolars with polars approach together on polars dataframe. I thought it would work nicely because they're just polars dataframe - looking the same. However, it sometimes gives me an error.
Starting from a simple one:
With the same dataframe output:
pl.read_csv().columns
- I can get the column namestp.read_csv().columns
- I cannot get the column namesMore interestingly, I was trying to use
mutate()
because tidyverse-style would be nicer. However, I cannot apply themutate()
directly to a polars dataframe if it's not directly coming through tidypolars; I can applymutate()
to the dataframe if I first convert a polars dataframe to pandas dataframe and convert it back to tidypolars polars dataframe. I suspect that it might have something to do with tibble formatting in the back but because the output dataframe looks identical to the typical polars dataframe so it sort of got me confused.Would it be possible to use tidypolars alongside with polars dataframe?