Open acangros opened 5 years ago
Do you mean graphs as in graphics, or graphs as in vertices and edges?
For Python you have matplotlib, seaborn, bokeh, plotly, etc..
Do you mean graphs as in graphics, or graphs as in vertices and edges? graphs as plots. "ggplot"its an amazing library for data science, newspaper, researchers... all you can imagine you can draw it. Python has nice tools for ploting, but not so cool at this moment.
Other two things:
integration with third parties, for example Neo4J for DBGraphs. I have no idea how good are R or Python in general?
community: R has an amazing and proactive comunity, and really participative on Twitter with #rstats or #tidyTuesday. I have no idea about Python.
For me, the strong pointsof R are dplyr, ggplot and comunnity.
Yes, both R and Python have great add-on packages for graphics. My point about graphics being built-in to R merely meant that, e.g. one can draw a histogram immediately in R, whereas for Python one has to learn an add-on.
The sense of community in R is wonderful, but is threatened by the language unity issue I brought up. There is tension among some leaders of both groups, quite alarming to me.
I used to force myself to read every piece of your book until I saw R for data science one day. It has better introduction to regex and more... To me, regex one essential part of any data language.
On Jun 14, 2019, at 1:35 PM, Norm Matloff notifications@github.com<mailto:notifications@github.com> wrote:
Yes, both R and Python have great add-on packages for graphics. My point about graphics being built-in to R merely meant that, e.g. one can draw a histogram immediately in R, whereas for Python one has to learn an add-on.
The sense of community in R is wonderful, but is threatened by the language unity issue I brought up. There is tension among some leaders of both groups, quite alarming to me.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/matloff/R-vs.-Python-for-Data-Science/issues/6?email_source=notifications&email_token=AFU4PHJ46NTAUHGGT44O7ZTP2PJHVA5CNFSM4HYE4KN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXXPHLQ#issuecomment-502199214, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFU4PHJNQIMWTFTVFP7HGNDP2PJHVANCNFSM4HYE4KNQ.
I must say I hate regex, and try not to use it much beyond the basics. I'm told that Python's regex is much better than R's, and was considering including this in my comparison, but due to lack of much knowledge of "advanced regex," chose not to do so.
In general you should always seek to use ggplot2
as your plotting library in R. It's nice that there are base graphics included in R but I've found that in almost every case they become a hindrance rather than a benefit. Base R plotting, along with grid
based plotting, is antiquated and extremely difficult to work with as soon as you need to interact with plots that someone else's code or library produces. The fact that you can assign your ggplot to an object and interact with it is a huge benefit; by contrast most other plotting packages involve writing directly to the graphics device, leaving you with massive headaches trying to do downstream manipulation of it. Further, ggplot2
has the great benefit of a single syntax for all types of plots, and it can act as a path to interactive JavaScript based plots via Plotly. If you are writing R code that produces plots then you need to be using ggplot2
.
Note that this negates some of the benefit of having built in graphs vs. Python.
Even ggplot graphs are not great for academic publications. That’s why people use those commercial softwares. It’s a shame actually.
R is good for learning statistics and doing some quick analysis on small datasets. That’s my impression.
On Jun 14, 2019, at 5:52 PM, Stephen Kelly notifications@github.com<mailto:notifications@github.com> wrote:
In general you should always seek to use ggplot2 as your plotting library in R. It's nice that there are base graphics included in R but I've found that in every case they become a hindrance rather than a benefit. Base R plotting, along with grid based plotting, is antiquated and extremely difficult to work with as soon as you need to interact with plots that someone else's code or library produces. The fact that you can assign your ggplot to an object and interact with it is a huge benefit; by contrast most other plotting packages involve writing directly to the graphics device, leaving you with massive headaches trying to do downstream manipulation of it. Further, ggplot2 has the great benefit of a single syntax for all types of plots, and it can act as a path to interactive JavaScript based plots via Plotly. If you are writing R code that produces plots then you need to be using ggplot2.
Note that this negates the benefit of having built in graphs vs. Python.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/matloff/R-vs.-Python-for-Data-Science/issues/6?email_source=notifications&email_token=AFU4PHOXRPXVBFJG4VDP2BTP2QHIVA5CNFSM4HYE4KN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXYFZYQ#issuecomment-502291682, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFU4PHJI5O3DPIO4GYH7EETP2QHIVANCNFSM4HYE4KNQ.
I'm a big user of ggplot2, have been since it first came out. But please note: (a) I didn't say one should not use R add-ons, quite the contrary. (b) For the beginner, ggplot2 is very abstract, difficult to pick up and poorly documented, with mystifying, frustrating error messages; the lattice package is just as powerful, and is more intuitive. (c) One can use Plotly without ggplot2 (see my cdparcoord pkg). (d) Once again: My comments are mainly regarding beginners; a new R user can type 'hist(Nile') right away, without add-ons.
Advantage to R for vanilla plotting in the base language, which base Python lacks. However Pandas provides plotting routines similar to base R, so a single library brings Python much closer to base R, and from there everything is extra libraries on both sides.
Re Pandas: Don't you need NumPy as well? And NumPy is pretty complicated. And a large number of functions for both? Saying "just one single extra library" seems unfair.
Pandas needs Numpy but to do base R type plotting the programmer doesn't need to know much about Numpy. The base R type plot functionality consists of methods of Pandas dataframes. Eg; pd.someDfThing.plot.line(). Not as obvious as R but not too bad, either.
Yeah you can definitely get quite far with just pandas and ignoring other niggles, e.g. df.hist(). Might not infer as much cool stuff as the R equiv, but gets you out of the starting blocks all the same (and no direct numpy needed).
That being said, one of my great python gripes is matplotlib, especially after ggplot2. The sooner I get stuck into plotnine the easier my life will be I think. Actually, df.hist() illustrates one of my problems as it's matplotlib underneath: can only specify # bins, not bin width as a single number, which is nuts.
Thank you for your effort. I’m convinced by you and will start to learn base R again.
@smartgamer:
Even ggplot graphs are not great for academic publications. That’s why people use those commercial softwares. It’s a shame actually.
That's just not true. I've seen lots of academic publications with ggplot / base R graphics (including my own).
R is good for learning statistics and doing some quick analysis on small datasets. That’s my impression.
Also not true. R is being used in large corporations and (academic) research institutes on small and large datasets. For example, I'm using R on datasets with millions of rows without a problem.
Yes, R is definitely used in large corporations. Ever heard of Google? :-) Actually, you might want to look at the large corps. in the R Consortium.
R packages such as ggplot2 and lattice are used all the time in academic publications, including mine.
However I don’t see any beautiful graphics in your book.
Charles
On Oct 18, 2019, at 1:34 PM, Norm Matloff notifications@github.com wrote:
Yes, R is definitely used in large corporations. Ever heard of Google? :-) Actually, you might want to look at the large corps. in the R Consortium.
R packages such as ggplot2 and lattice are used all the time in academic publications, including mine.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/matloff/R-vs.-Python-for-Data-Science/issues/6?email_source=notifications&email_token=AFU4PHKWPP33A6WHN3VHOILQPHXSBA5CNFSM4HYE4KN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBVIPIY#issuecomment-543852451, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFU4PHLS5FTOOKGDQXPAXRLQPHXSBANCNFSM4HYE4KNQ.
Well, Charles, most of my books don't have much graphics, but the won that does won a major award in 2017. Presumably that means the graphics were publication quality.
great article! short and very concise i think that one of the strong points of R is the great capacity to generate amazing graphs. Is it the same for Python?