has2k1 / plotnine

A Grammar of Graphics for Python
https://plotnine.org
MIT License
4.07k stars 225 forks source link

How does this package differ from ggpy? #1

Closed naught101 closed 7 years ago

naught101 commented 7 years ago

Can you please give a quick overview of how plotnine differs from ggpy, and what are the pros and cons of each approach?

has2k1 commented 7 years ago

I think I do not have the requisite objectivity to compare the two, plus I have not used ggpy in it's current iteration. There is the about-plotnine page that states the history and goals of this project.

shoyer commented 7 years ago

@has2k1 you may not be objective, but you are the best advocate for your work!

Looking at the API docs, you have a lot of awesome functionality here. One thing that would help to show it off is to make a gallery of examples, ala seaborn. You might even try reproducing figures from the seaborn docs (conceptually, not pixel for pixel) to show the breadth of your work.

has2k1 commented 7 years ago

I agree.

There will be a gallery as examples are added. I want mostly "rich" examples -- narrative/task driven --, hence they must be Jupyter notebooks. The integration between the notebooks and documentation is already in place.

I'm still thinking of how to do the gallery. It must draw from the plots in the examples and also serve it's core duty, that is, to be the facade of the plotting package.

shoyer commented 7 years ago

@has2k1 awesome, I'm looking forward to it!

naught101 commented 7 years ago

Yeah, I appreciate the desire to be objective, but from a user perspective, all that really matters is:

a) Can I make the plot I want with this software (I'd assume both packages are capable here) b) How do I go about doing that? (What are the API differences?), and c) What do I need to be careful about? (e.g. are there particular kinds of memory optimisations that are more important for one package than another - I'm thinking here of my previous problems with memory management using R's ggplot2, since it stores copies of dataframes that are passed to it)

The gallery might be most useful if it could show off some of the things that plotnine can do really well/easily, that other packages struggle with.

has2k1 commented 7 years ago

@naught101, In general I am in agreement about what the documentation must convey to the different types of users. For now why it seems rather inadequate, is simply due to an unrealised vision. I will put up issues/wiki that can be a source of reference and also get some input.

For a bummer about your memory issues, if you had them with ggplot2 you will likely have them with plotnine. The only caveat being when you had them, if it was ggplot2 pre around 2011 then you may be in luck. We also make a copy of the data. This should be abated when Pandas implements libpandas (and gets copy-on-write) currently expected to be around version 2.0.

has2k1 commented 7 years ago

@naught101, I have put up an example at https://github.com/has2k1/plotnine-examples/blob/master/plotnine_examples/notebooks/geom_tile.ipynb

That example is purely organised information presentation aimed at a users very familiar with the package. It is definitely on the advanced level but there can be more than one use case per documented object, and it is from such that the gallery images will be selected.

I think that has longer life-time than a comparison with other packages. If there is a comparison on a blog or elsewhere, we can always link to it and/or adapt whatever good examples they use for our documentation.

has2k1 commented 7 years ago

The gallery is in place and is seeded with the first entries.

pteehan commented 7 years ago

I just wrote an article in which I briefly compare plotnine and ggpy. In short, plotnine appears to be a functional port of ggplot2, whereas ggpy is buggy and incomplete, at least in the few examples that I tried -- so the assumption that both packages are equally capable is actually not true. This is not an exhaustive analysis by any means but it strongly points to plotnine as the better choice right now. http://pltn.ca/plotnine-superior-python-ggplot/

has2k1 commented 7 years ago

@pteehan, I saw it, it is a nice and revealing comparison, it gets at the core motivation of plotnine. I see that you have also clarified the title, I will make a page in the documentation that links to it.

jankatins commented 7 years ago

This is another page which has some comparison of plotnine against other pyhton plotting libs + Rs ggplot2: http://pythonplot.com/

plotnine: "plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2." plotnine is a recent attempt to directly translate ggplot2 to Python; despite some quirks and bugs, it works very well for a young product.

dhimmel commented 7 years ago

One nice aspect of ggpy is that it's released under a BSD 2-Clause License, which is more permissive than this repository's GPLv2 license. However, in the past I do remember yhat's ggplot package being buggy (although I haven't tried the latest ggpy-branded versions).

has2k1 commented 7 years ago

A BSD 2-Clause License would definitely be preferable, but there was little choice in the matter. The GPLv2 License propagates through as follows R -> ggplot2 -> plotnine. ggplot2 has/had some code snippets from the R source, and plotnine qualifies as a derivative work of ggplot2.

shoyer commented 7 years ago

plotnine qualifies as a derivative work of ggplot2.

Have you actually copied algorithms/code from ggplot2, or just the API? If only the later, whether APIs are protected by copyright is a matter subject to ongoing dispute. Of course, how to handle this is certianly entirely your call for this library.

has2k1 commented 7 years ago

The data transformation pipeline that facilitates "grammar" is copied from ggplot2. I considered isolating it into a separate package, but that could not happen without making it more complicated. Instead, I want to make it possible for other packages to be built on top of this one, and those need not have the same license.

abubelinha commented 2 years ago

I just wrote an article in which I briefly compare plotnine and ggpy. In short, plotnine appears to be a functional port of ggplot2, whereas ggpy is buggy and incomplete, at least in the few examples that I tried -- so the assumption that both packages are equally capable is actually not true. This is not an exhaustive analysis by any means but it strongly points to plotnine as the better choice right now. http://pltn.ca/plotnine-superior-python-ggplot/

The comparative made by @pteehan above shows a dead link, but I found this archive.org backup:

http://web.archive.org/web/20181012022314/http://pltn.ca/plotnine-superior-python-ggplot/