Open omrihar opened 4 years ago
Hey @omrihar--thanks for checking out siuba! I tried to compare key differences in this key feature doc--does that cover your question?
In general, siuba goes deeper than previous ports in 3 ways:
I think all the ports to python are similar on the surface, but what siuba has tried to nail is the ability to execute on different backends (pandas, sql, down the line spark & dask). Because this was a focus from the beginning, this kind of extendability is a part of siuba's architecture :).
If you're interested in helping with implementing pivot_wider
and pivot_longer
, @breichholf contributed a PR (#238) with the bulk of pivot_longer
. I think I dropped the ball there, but if you're interested in the pivot_
functions, I don't mind helping with it again!
Hey @machow thanks for the quick reply! I saw the docs on the key feature but since it was compared with dplython rather than dfply, I was wondering if you were aware of that (I'm asking since it seems that dfply implemented a larger subset of dplyr than dplython, so is maybe a better benchmark).
Be that as it may, I'm quite interested in anything that will make data wrangling painless in python - so I'm quite interested in following siuba :) I also really like the idea of supporting SQL, Spark and Dask in the future! That's a very nice addition...
I found siuba while searching for something that implements the pivot_*
functions, but actually I decided to manually create a dataframe that fits my needs exactly directly in pandas (maybe it's also because pivoting was always a bit confusing to me...). I'm not generally interested in this specific application, rather more interested in a general framework for "grammar of data" style libraries.
Thank you for the good work :) If I find somewhere I can contribute, I will definitely try to!
I saw the docs on the key feature but since it was compared with dplython rather than dfply, I was wondering if you were aware of that
Ah, that's fair! I have a blog post draft sitting around comparing siuba to dplython, dfply, and plydata, so this is helpful to hear. I'll try to push it out in the next couple days! I think the main things missing in siuba are bind_rows/cols
, row_slice
, and sample
.
RE pivot functions and creating what you need in pandas, there's a fairly in-depth discussion in #233 about what these would look like in pandas. From what I remember, pivot_longer is fairly straight forward to implement (just some kind of convoluted resetting of indexes). pivot_wider
could be largely a wrapper around pandas .pivot_table
method. The challenge is it's hard to do anything beyond what would be a simple values_fn
arg in dplyr.
Good Morning,
I came across this project when I was searching for an up-to-date library that forks dplyr to python. I've been using
dfply
for a while (before starting to use dplyr directly in R), and I was looking for a library that implementspivot_wider
andpivot_longer
, since dfply does not implement it (and seems to be inactive at the moment).Since this library seems to be quite close to dfply (more than to, say dplython), I was wondering what are some of the key differences between the two libraries? It seems that every few years another library pops that tries to port dplyr to python, which I guess is a difficult task, but it seemed to me that the dfply approach was already quite good - so maybe building on top of it would have been a good option?
Thank you for the effort of bringing some tidyverse goodness to python :)