matloff / R-vs.-Python-for-Data-Science

430 stars 37 forks source link

Available libraries a tie? #8

Open onnokleen opened 5 years ago

onnokleen commented 5 years ago

Thank you very much for your overview!

However, you call the race about "available libraries" a tie even though you mention that quite basic statistical procedures are not available (or hard to find) in Python:

The following searches in PyPI turned up nothing: log-linear model; Poisson regression; instrumental variables; spatial data; familywise error rate; etc.

I would say that this is a huge minus for Python?! In my own experience, the package availability for time series modeling is even worse.

In general, a search for statistics at PyPI (https://pypi.org/search/?q=statistics) returns only 2,541 packages (what do I care about packages like (https://pypi.org/project/plone.event/) 😄).

(Sidenote: I am a fan of the tidyverse and would say that it is also beneficial for professional users but I recognize that this is open for debate)

maxnoe commented 5 years ago

Maybe the number of packages and the fact there are no small packages for one topic biases against python, as many of the functionalities you mention live in three to four huge packages.

So why searching for packages on PyPI and not for the functionality itself?

onnokleen commented 5 years ago

Good point! For example Poisson regression can be found in statsmodels: (https://www.statsmodels.org/stable/generated/statsmodels.discrete.discrete_model.Poisson.html).

Nonetheless, the point for time series data still holds. Just two example:

1) Bayesian VARs: (https://www.google.com/search?client=firefox-b-d&q=bayesian+var+model+python+primiceri) or 2) Volatility modeling: (https://www.financialriskforecasting.com/code/RPython3.html)

matloff commented 5 years ago

Again: It was just an issue of what could be found via PyPI, compared to CRAN. Actually, there are many good R packages not on CRAN.