Python or R? – Ian Blumenfeld – Medium http://ift.tt/2xEIpUx STEM aimlnn Data Science Python or R? Ian BlumenfeldApr 17, 2016 One of the debates du-jour in the DS community is whether you should be using Python or R. I don’t think this is going to come to a resolution, nor do I offer a solution. But as someone who has used both extensively, I do have some thoughts. It boils down to… (drum roll please) it depends. Now that I’ve given you basically the same answer you give to everyone who asks you for an opinion on data, let me expand on how I see the tradeoffs. R is more expressive Built off a more functional paradigm, I’ve found R to be a more expressive and compact language. You can build a lot more into one line, which can give a tightness to your code that is hard to achieve in python. I enjoyed this aspect of working in R, although it can be hard to approach as a newcomer. In python, while functions are first class, functional programming is not (and in fact many of the common methods are there only because the community wants them, not the designers). R is more mature for analysis This is currently true, but is getting less so every day. I still find R Data Frames more intuitive than Pandas (although Pandas has improved a lot over the past few years). The statistical modeling packages are still better in R, especially if you want to go past the standard regression packages. Useful methods like survival analysis have standard, well tested packages in R, but not in Python. Python supports frameworks Sometimes the tool you want to use is better implemented as a framework you can configure, rather than a library you need to stitch together. MCMC is a great example of this. I found it much easier to use a framework such as PyMC, where I instantiated distributions and sampling objects to get a result, than something like the MCMC package in R, where functions needed to be tied together. Python interfaces cleanly with databases This became particularly important to me only later in my career, but I found it much easier to connect to databases in python than in R. In particular, python has well supported wrappers for DB-APIs to every major database, SQL and NoSQL alike. It also has SQLAlchemy, which is a fantastic ORM and SQL abstraction library. It makes working with a db seamless. Python is used by developers Again, this became important to me only later on, but Python is commonly used by software engineers, for data pipelines, machine learning frameworks, all the way out to web development. This makes it MUCH easier to deploy python built models in production environments. It also allows one to take advantage of the engineering tool chain, from connection management, to profiling, to unit testing. New technologies like Spark and TensorFlow treat python as a first class language, on par with Java, Scala, and the like, meaning python gives immediate access to state of the art. All of this enables you to build scalable data products as part of a team, which is my preferred way of getting my work out into the world. So in the end, it depends what you want to do. I work exclusively in python these days. If you’ll be working on web technologies (or near them) I think it’s a very solid choice. But I don’t see any reason not to use R (running it on your servers is another question- I don’t recommend that). There are some other options floating around, Julia being the most notable. Again, I’m not sure it really matters- most important is to find a place where you can be comfortable and productive.
Python or R? – Ian Blumenfeld – Medium
Label: AI-NN-ML
Date: September 01, 2017 at 10:37PM
Python or R? – Ian Blumenfeld – Medium http://ift.tt/2xEIpUx STEM aimlnn Data Science Python or R? Ian BlumenfeldApr 17, 2016 One of the debates du-jour in the DS community is whether you should be using Python or R. I don’t think this is going to come to a resolution, nor do I offer a solution. But as someone who has used both extensively, I do have some thoughts. It boils down to… (drum roll please) it depends. Now that I’ve given you basically the same answer you give to everyone who asks you for an opinion on data, let me expand on how I see the tradeoffs. R is more expressive Built off a more functional paradigm, I’ve found R to be a more expressive and compact language. You can build a lot more into one line, which can give a tightness to your code that is hard to achieve in python. I enjoyed this aspect of working in R, although it can be hard to approach as a newcomer. In python, while functions are first class, functional programming is not (and in fact many of the common methods are there only because the community wants them, not the designers). R is more mature for analysis This is currently true, but is getting less so every day. I still find R Data Frames more intuitive than Pandas (although Pandas has improved a lot over the past few years). The statistical modeling packages are still better in R, especially if you want to go past the standard regression packages. Useful methods like survival analysis have standard, well tested packages in R, but not in Python. Python supports frameworks Sometimes the tool you want to use is better implemented as a framework you can configure, rather than a library you need to stitch together. MCMC is a great example of this. I found it much easier to use a framework such as PyMC, where I instantiated distributions and sampling objects to get a result, than something like the MCMC package in R, where functions needed to be tied together. Python interfaces cleanly with databases This became particularly important to me only later in my career, but I found it much easier to connect to databases in python than in R. In particular, python has well supported wrappers for DB-APIs to every major database, SQL and NoSQL alike. It also has SQLAlchemy, which is a fantastic ORM and SQL abstraction library. It makes working with a db seamless. Python is used by developers Again, this became important to me only later on, but Python is commonly used by software engineers, for data pipelines, machine learning frameworks, all the way out to web development. This makes it MUCH easier to deploy python built models in production environments. It also allows one to take advantage of the engineering tool chain, from connection management, to profiling, to unit testing. New technologies like Spark and TensorFlow treat python as a first class language, on par with Java, Scala, and the like, meaning python gives immediate access to state of the art. All of this enables you to build scalable data products as part of a team, which is my preferred way of getting my work out into the world. So in the end, it depends what you want to do. I work exclusively in python these days. If you’ll be working on web technologies (or near them) I think it’s a very solid choice. But I don’t see any reason not to use R (running it on your servers is another question- I don’t recommend that). There are some other options floating around, Julia being the most notable. Again, I’m not sure it really matters- most important is to find a place where you can be comfortable and productive.
Python or R? – Ian Blumenfeld – Medium
Label: AI-NN-ML
Date: September 01, 2017 at 10:37PM