matloff / R-vs.-Python-for-Data-Science

430 stars 37 forks source link

graphs #6

Open acangros opened 5 years ago

acangros commented 5 years ago

great article! short and very concise i think that one of the strong points of R is the great capacity to generate amazing graphs. Is it the same for Python?

matloff commented 5 years ago

Do you mean graphs as in graphics, or graphs as in vertices and edges?

jasonjiang8866 commented 5 years ago

For Python you have matplotlib, seaborn, bokeh, plotly, etc..

acangros commented 5 years ago

Do you mean graphs as in graphics, or graphs as in vertices and edges? graphs as plots. "ggplot"its an amazing library for data science, newspaper, researchers... all you can imagine you can draw it. Python has nice tools for ploting, but not so cool at this moment.

Other two things:

For me, the strong pointsof R are dplyr, ggplot and comunnity.

matloff commented 5 years ago

Yes, both R and Python have great add-on packages for graphics. My point about graphics being built-in to R merely meant that, e.g. one can draw a histogram immediately in R, whereas for Python one has to learn an add-on.

The sense of community in R is wonderful, but is threatened by the language unity issue I brought up. There is tension among some leaders of both groups, quite alarming to me.

smartgamer commented 5 years ago

I used to force myself to read every piece of your book until I saw R for data science one day. It has better introduction to regex and more... To me, regex one essential part of any data language.

On Jun 14, 2019, at 1:35 PM, Norm Matloff notifications@github.com<mailto:notifications@github.com> wrote:

Yes, both R and Python have great add-on packages for graphics. My point about graphics being built-in to R merely meant that, e.g. one can draw a histogram immediately in R, whereas for Python one has to learn an add-on.

The sense of community in R is wonderful, but is threatened by the language unity issue I brought up. There is tension among some leaders of both groups, quite alarming to me.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/matloff/R-vs.-Python-for-Data-Science/issues/6?email_source=notifications&email_token=AFU4PHJ46NTAUHGGT44O7ZTP2PJHVA5CNFSM4HYE4KN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXXPHLQ#issuecomment-502199214, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFU4PHJNQIMWTFTVFP7HGNDP2PJHVANCNFSM4HYE4KNQ.

matloff commented 5 years ago

I must say I hate regex, and try not to use it much beyond the basics. I'm told that Python's regex is much better than R's, and was considering including this in my comparison, but due to lack of much knowledge of "advanced regex," chose not to do so.

stevekm commented 5 years ago

In general you should always seek to use ggplot2 as your plotting library in R. It's nice that there are base graphics included in R but I've found that in almost every case they become a hindrance rather than a benefit. Base R plotting, along with grid based plotting, is antiquated and extremely difficult to work with as soon as you need to interact with plots that someone else's code or library produces. The fact that you can assign your ggplot to an object and interact with it is a huge benefit; by contrast most other plotting packages involve writing directly to the graphics device, leaving you with massive headaches trying to do downstream manipulation of it. Further, ggplot2 has the great benefit of a single syntax for all types of plots, and it can act as a path to interactive JavaScript based plots via Plotly. If you are writing R code that produces plots then you need to be using ggplot2.

Note that this negates some of the benefit of having built in graphs vs. Python.

smartgamer commented 5 years ago

Even ggplot graphs are not great for academic publications. That’s why people use those commercial softwares. It’s a shame actually.

R is good for learning statistics and doing some quick analysis on small datasets. That’s my impression.

On Jun 14, 2019, at 5:52 PM, Stephen Kelly notifications@github.com<mailto:notifications@github.com> wrote:

In general you should always seek to use ggplot2 as your plotting library in R. It's nice that there are base graphics included in R but I've found that in every case they become a hindrance rather than a benefit. Base R plotting, along with grid based plotting, is antiquated and extremely difficult to work with as soon as you need to interact with plots that someone else's code or library produces. The fact that you can assign your ggplot to an object and interact with it is a huge benefit; by contrast most other plotting packages involve writing directly to the graphics device, leaving you with massive headaches trying to do downstream manipulation of it. Further, ggplot2 has the great benefit of a single syntax for all types of plots, and it can act as a path to interactive JavaScript based plots via Plotly. If you are writing R code that produces plots then you need to be using ggplot2.

Note that this negates the benefit of having built in graphs vs. Python.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/matloff/R-vs.-Python-for-Data-Science/issues/6?email_source=notifications&email_token=AFU4PHOXRPXVBFJG4VDP2BTP2QHIVA5CNFSM4HYE4KN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXYFZYQ#issuecomment-502291682, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFU4PHJI5O3DPIO4GYH7EETP2QHIVANCNFSM4HYE4KNQ.

matloff commented 5 years ago

I'm a big user of ggplot2, have been since it first came out. But please note: (a) I didn't say one should not use R add-ons, quite the contrary. (b) For the beginner, ggplot2 is very abstract, difficult to pick up and poorly documented, with mystifying, frustrating error messages; the lattice package is just as powerful, and is more intuitive. (c) One can use Plotly without ggplot2 (see my cdparcoord pkg). (d) Once again: My comments are mainly regarding beginners; a new R user can type 'hist(Nile') right away, without add-ons.

chasbecker commented 5 years ago

Advantage to R for vanilla plotting in the base language, which base Python lacks. However Pandas provides plotting routines similar to base R, so a single library brings Python much closer to base R, and from there everything is extra libraries on both sides.

matloff commented 5 years ago

Re Pandas: Don't you need NumPy as well? And NumPy is pretty complicated. And a large number of functions for both? Saying "just one single extra library" seems unfair.

chasbecker commented 5 years ago

Pandas needs Numpy but to do base R type plotting the programmer doesn't need to know much about Numpy. The base R type plot functionality consists of methods of Pandas dataframes. Eg; pd.someDfThing.plot.line(). Not as obvious as R but not too bad, either.

Zylatis commented 5 years ago

Yeah you can definitely get quite far with just pandas and ignoring other niggles, e.g. df.hist(). Might not infer as much cool stuff as the R equiv, but gets you out of the starting blocks all the same (and no direct numpy needed).

That being said, one of my great python gripes is matplotlib, especially after ggplot2. The sooner I get stuck into plotnine the easier my life will be I think. Actually, df.hist() illustrates one of my problems as it's matplotlib underneath: can only specify # bins, not bin width as a single number, which is nuts.

smartgamer commented 5 years ago

Thank you for your effort. I’m convinced by you and will start to learn base R again.

jaapwalhout commented 5 years ago

@smartgamer:

Even ggplot graphs are not great for academic publications. That’s why people use those commercial softwares. It’s a shame actually.

That's just not true. I've seen lots of academic publications with ggplot / base R graphics (including my own).

R is good for learning statistics and doing some quick analysis on small datasets. That’s my impression.

Also not true. R is being used in large corporations and (academic) research institutes on small and large datasets. For example, I'm using R on datasets with millions of rows without a problem.

matloff commented 5 years ago

Yes, R is definitely used in large corporations. Ever heard of Google? :-) Actually, you might want to look at the large corps. in the R Consortium.

R packages such as ggplot2 and lattice are used all the time in academic publications, including mine.

smartgamer commented 5 years ago

However I don’t see any beautiful graphics in your book.

Charles

On Oct 18, 2019, at 1:34 PM, Norm Matloff notifications@github.com wrote:



Yes, R is definitely used in large corporations. Ever heard of Google? :-) Actually, you might want to look at the large corps. in the R Consortium.

R packages such as ggplot2 and lattice are used all the time in academic publications, including mine.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/matloff/R-vs.-Python-for-Data-Science/issues/6?email_source=notifications&email_token=AFU4PHKWPP33A6WHN3VHOILQPHXSBA5CNFSM4HYE4KN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBVIPIY#issuecomment-543852451, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFU4PHLS5FTOOKGDQXPAXRLQPHXSBANCNFSM4HYE4KNQ.

matloff commented 5 years ago

Well, Charles, most of my books don't have much graphics, but the won that does won a major award in 2017. Presumably that means the graphics were publication quality.