Closed kylebutts closed 2 years ago
Yeah, I've been thinking about this. It might enough to wrap it into the intro, but regardless agree that we should keep it to a sentence or two max. The purpose of the guide is really just to provide direct equivalents for learning by comparison.
Something else along these lines is a (very brief) beginning/prologue section. Like, "Install: install.packages(xxx)
" etc. Here is where I'd also like to add one or two options/suggestions for each package, e.g.
setFixest_estimation(data = dat)
--> option for setting the dataset globally, which Stata users might likeoptions(datatable.print.class = TRUE, datatable.print.keys = TRUE)
--> show column types and keys at the top of a data.table when printed to screen a la tibbles)Keeping in mind the goal of the site, and also offering something that the vignettes don't offer, why don't we make this in the format of how the data.table
components translate from Stata? As a brief example of what I mean:
The three main components of a data.table
operation are i
, j
, and by
, which go in the order d[i, j, by]
.^[If you're not currently using j
or by
you can leave them out.]
i
, the first component, selects the rows of the data.table
that you'll be working with, like how in Stata the keep if
or drop if
commands, or the if
or in
command options, select specific rows of your data to work with.
j
, the second component, both selects and operates on the columns of the data.table
, like how in Stata the keep
or drop
commands select specific columns of your data, or how generate
or replace
create or modify columns in your data.
by
, the third component, gives the variable(s) designating groups that you'll be doing your calculations within, like how in Stata you can precede a command with bysort
.
data.table
uses these three simple components more flexibly than Stata does, and you can get quite a lot out of data.table
using only these three components that would require multiple other commands in Stata to accomplish. But even if you aren't doing anything fancy, data.table
offers considerable speed gains over Stata (and many other R data-manipulation packages, for that matter).
Closing in favour of https://github.com/stata2r/stata2r.github.io/issues/7
I think it might be good to have an overview of how data.table works before the cheatsheet, e.g. i/j/by. It might be a delicate line between giving enough that the cheat sheet makes sense while not trying to rewrite introductory vignettes