Closed naught101 closed 7 years ago
I think I do not have the requisite objectivity to compare the two, plus I have not used ggpy
in it's current iteration. There is the about-plotnine page that states the history and goals of this project.
@has2k1 you may not be objective, but you are the best advocate for your work!
Looking at the API docs, you have a lot of awesome functionality here. One thing that would help to show it off is to make a gallery of examples, ala seaborn. You might even try reproducing figures from the seaborn docs (conceptually, not pixel for pixel) to show the breadth of your work.
I agree.
There will be a gallery as examples are added. I want mostly "rich" examples -- narrative/task driven --, hence they must be Jupyter notebooks. The integration between the notebooks and documentation is already in place.
I'm still thinking of how to do the gallery. It must draw from the plots in the examples and also serve it's core duty, that is, to be the facade of the plotting package.
@has2k1 awesome, I'm looking forward to it!
Yeah, I appreciate the desire to be objective, but from a user perspective, all that really matters is:
a) Can I make the plot I want with this software (I'd assume both packages are capable here) b) How do I go about doing that? (What are the API differences?), and c) What do I need to be careful about? (e.g. are there particular kinds of memory optimisations that are more important for one package than another - I'm thinking here of my previous problems with memory management using R's ggplot2, since it stores copies of dataframes that are passed to it)
The gallery might be most useful if it could show off some of the things that plotnine can do really well/easily, that other packages struggle with.
@naught101, In general I am in agreement about what the documentation must convey to the different types of users. For now why it seems rather inadequate, is simply due to an unrealised vision. I will put up issues/wiki that can be a source of reference and also get some input.
For a bummer about your memory issues, if you had them with ggplot2 you will likely have them with plotnine. The only caveat being when you had them, if it was ggplot2 pre around 2011 then you may be
in luck. We also make a copy of the data. This should be abated when Pandas implements libpandas
(and gets copy-on-write) currently expected to be around version 2.0.
@naught101, I have put up an example at https://github.com/has2k1/plotnine-examples/blob/master/plotnine_examples/notebooks/geom_tile.ipynb
That example is purely organised information presentation aimed at a users very familiar with the package. It is definitely on the advanced level but there can be more than one use case per documented object, and it is from such that the gallery images will be selected.
I think that has longer life-time than a comparison with other packages. If there is a comparison on a blog or elsewhere, we can always link to it and/or adapt whatever good examples they use for our documentation.
I just wrote an article in which I briefly compare plotnine and ggpy. In short, plotnine appears to be a functional port of ggplot2, whereas ggpy is buggy and incomplete, at least in the few examples that I tried -- so the assumption that both packages are equally capable is actually not true. This is not an exhaustive analysis by any means but it strongly points to plotnine as the better choice right now. http://pltn.ca/plotnine-superior-python-ggplot/
@pteehan, I saw it, it is a nice and revealing comparison, it gets at the core motivation of plotnine. I see that you have also clarified the title, I will make a page in the documentation that links to it.
This is another page which has some comparison of plotnine against other pyhton plotting libs + Rs ggplot2: http://pythonplot.com/
plotnine: "plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2." plotnine is a recent attempt to directly translate ggplot2 to Python; despite some quirks and bugs, it works very well for a young product.
One nice aspect of ggpy
is that it's released under a BSD 2-Clause License, which is more permissive than this repository's GPLv2 license. However, in the past I do remember yhat's ggplot
package being buggy (although I haven't tried the latest ggpy
-branded versions).
A BSD 2-Clause License would definitely be preferable, but there was little choice in the matter. The GPLv2 License propagates through as follows R -> ggplot2 -> plotnine. ggplot2 has/had some code snippets from the R source, and plotnine qualifies as a derivative work of ggplot2.
plotnine qualifies as a derivative work of ggplot2.
Have you actually copied algorithms/code from ggplot2, or just the API? If only the later, whether APIs are protected by copyright is a matter subject to ongoing dispute. Of course, how to handle this is certianly entirely your call for this library.
The data transformation pipeline that facilitates "grammar" is copied from ggplot2. I considered isolating it into a separate package, but that could not happen without making it more complicated. Instead, I want to make it possible for other packages to be built on top of this one, and those need not have the same license.
I just wrote an article in which I briefly compare plotnine and ggpy. In short, plotnine appears to be a functional port of ggplot2, whereas ggpy is buggy and incomplete, at least in the few examples that I tried -- so the assumption that both packages are equally capable is actually not true. This is not an exhaustive analysis by any means but it strongly points to plotnine as the better choice right now. http://pltn.ca/plotnine-superior-python-ggplot/
The comparative made by @pteehan above shows a dead link, but I found this archive.org backup:
http://web.archive.org/web/20181012022314/http://pltn.ca/plotnine-superior-python-ggplot/
Can you please give a quick overview of how plotnine differs from ggpy, and what are the pros and cons of each approach?