Closed hayd closed 10 years ago
I guess R is famous for obfuscation (of syntax)?
I suspect it'll be a many-to-one table :)
ha! u guys are funny. what the heck is attach? is like attach(x) == globals()['x'] = x
?
isn't R intuitive?
where the heck are cyl
and vs
coming from? This
attach(mtcars)
aggdata <- aggregate(mtcars, by=list(cyl,vs), FUN=mean, na.rm=TRUE)
detach(mtcars)
works only if you do the attach(mtcars)
? wtf are the scoping rules in R? no such thing exists in Python without a lot of magic...
Attach basically is like saying 'make all of the columns of the data frame global variables'
It has a companion method detach. I think there's also a with - like statement that scopes just to the function call. Have you seen the model syntax yet? a ~ b
I totally get that it's useful, but it's a little unsettling when you are used to being able to explicitly trace all names in the document.
patsy + statsmodels + pandas >>>>> R
magic regarding scope and namespaces :-1:
anyway comparisons are useful to show people how awesome pandas is :)
related http://stackoverflow.com/questions/17621325/equivalent-pandas-function-to-this-r-aggregation
Anyone fancy spamming the pandas/R/.. mailing lists to see if anyone is interested in doing this?
https://groups.google.com/forum/#!topic/pydata/1eNURQsflNw
A while back I started making some notes on how to do the various recipes in O'Reilly's R Cookbook (http://shop.oreilly.com/product/9780596809164.do) with Numpy, Pandas, Scipy.
I haven't had time to complete it so I'm sharing it in it's current state, and trying to get some community help to fill in the gaps.
I think this could be an extremely useful resource to encourage and help transition lots of people from R to Pandas.
So here's the notes:
http://notes.lexual.com/tech/r_numpy_pandas_cookbook.html
And here's the github repo, patches more than welcome!
https://github.com/lexual/sphinx-notes/blob/master/source/tech/r_numpy_pandas_cookbook.rst
Cheers,
Lex.
These look useful, shame there are some sections which are XXX-titled, as would be nice to have a todo list on this for areas to flesh out.
@chappers
want to add this: http://stackoverflow.com/questions/20905713/equivalent-of-rs-tapply-in-python-pandas
hmm, would you want this to go under the reshape/cast
section, or in the with
section, since it could be done in R using dcast
as well:
mydf <- data.frame(
Animal = c('Animal1', 'Animal2', 'Animal3', 'Animal2', 'Animal1', 'Animal2', 'Animal3'),
FeedType = c('A', 'B', 'A', 'A', 'B', 'B', 'A'),
Amount = c(10, 7, 4, 2, 5, 6, 2)
)
# Stackoverflow example
with(mydf, tapply(Amount, list(Animal, FeedType), sum))
# Using reshape
require(reshape2)
dcast(mydf, Animal ~ FeedType, sum, fill=NaN)
In either case the solution would be whats in Stackoverflow (and very similar to the solution in the reshape/cast
section of the current docs).
you can out under the more common / useful and put a link / statement in the other (as they r in the same page)
read it as if you are an R user doing the most common operation (eg what is normally recommended to R Users) and you want to convert to pandas
there are of course similar cases in pandas where multiple solutions present (eg imagine a vectorized function vs using apply)
one solution maybe faster or simpler or they may both be appropriate
think this is closable after the multiple PR's by @chappers
I guess quite a lot of people come from an R background, and perhaps a good material would be a conversion table for pandas vs R functions/idioms etc. in http://pandas.pydata.org/pandas-docs/dev/comparison_with_r.html
Perhaps this site could offer some functions to consider including: http://www.statmethods.net/management/variables.html