Open Enchufa2 opened 5 years ago
Your need for linked data structures again brings up the question of what constitutes data science. I have no answer. I may add a point about package development. Re language unity, there's more I can say, but this gets into the realm of the personal. Note that R vs. Python is largely stat vs. CS; NNs have mostly been developed by CS people, thus Python, but as I said, I consider R superior in random forests and gradient boosting.
The comments about language unity are quite pertinent. In particular, (i) commercial influence (ii) the example of data.table- an outstanding package.
Elegance
I appreciate Python's elegance too, but it's also true that spaces-vs-tabs issues are a real pain.
Learning curve
I wouldn't call it a huge win for R. It's true that for data science, you have mostly everything you need in base R compared to Python. But how many people work with plain R?
And putting aside the tools that are specific to data science, i.e., talking about the language itself (which is the first thing you need to master to start learning data science), that's a win for Python, because I think it's far more intuitive and easy to learn. R has many many strange things that are unique to R, such as the ability to modify itself, NSE, etc. These are versatile features, but hard understand and master.
All in all, I would call it a tie.
Available libraries
I don't see many Python data science libraries backed by an academic publication, and that's a small win for R, in my opinion.
Machine learning
The big actors are pushing for Python here, that's the truth. R tries to follow, but it's still behind.
Statistical correctness
I reaffirm what I said before about academic publications. I think it's important to highlight this point.
Object orientation, metaprogramming
I think that these categories deserve separate comments. I also like very much R's metaprogramming capabilities (which are great, but make it harder to learn, as I argued before). But I don't think it's fair to defend R's seriousness treating functions as objects, and, at the same time, to defend the R's OOP mess over Python's seriousness in this regard.
Language unity
Whether RStudio people see
data.table
as a competitor, that I don't know. But I don't think so for some reasons. I don't think thatdplyr
anddata.table
are in the same league, or serve the same purpose, becausedplyr
does not provide a new data frame backend (that would betibble
, but tibbles are just data frames with attributes, so it's not competing either).dplyr
's purpose is to define a standard data wrangling interface that is independent from the source: a data frame, a database... or even a data table, because there's even thedtplyr
package, adata.table
backend fordplyr
, developed by Hadley himself.I don't understand what's exactly The Tidyverse Curse. Is it the pipe? (Which was there before the tidyverse, BTW). Because you can use tidyverse functions without the pipe, and the look and feel would be very similar to the subset/transform/aggregate/reshape workflow you could do with base R. And you could use base R with the pipe too.
Linked data structures
Many times I need this, probably due to my CS background, and I would call it a big win for Python.
Packages
Package development and the CRAN infrastructure are a huge win for R. I was surprised that there's no mention to this.