Introduction to each package before cheatsheet

kylebutts commented 2 years ago

I think it might be good to have an overview of how data.table works before the cheatsheet, e.g. i/j/by. It might be a delicate line between giving enough that the cheat sheet makes sense while not trying to rewrite introductory vignettes

grantmcdermott commented 2 years ago

Yeah, I've been thinking about this. It might enough to wrap it into the intro, but regardless agree that we should keep it to a sentence or two max. The purpose of the guide is really just to provide direct equivalents for learning by comparison.

Something else along these lines is a (very brief) beginning/prologue section. Like, "Install: install.packages(xxx)" etc. Here is where I'd also like to add one or two options/suggestions for each package, e.g.

setFixest_estimation(data = dat) --> option for setting the dataset globally, which Stata users might like
options(datatable.print.class = TRUE, datatable.print.keys = TRUE) --> show column types and keys at the top of a data.table when printed to screen a la tibbles)

NickCH-K commented 2 years ago

Keeping in mind the goal of the site, and also offering something that the vignettes don't offer, why don't we make this in the format of how the data.table components translate from Stata? As a brief example of what I mean:

The three main components of a data.table operation are i, j, and by, which go in the order d[i, j, by].^[If you're not currently using j or by you can leave them out.]

i, the first component, selects the rows of the data.table that you'll be working with, like how in Stata the keep if or drop if commands, or the if or in command options, select specific rows of your data to work with.

j, the second component, both selects and operates on the columns of the data.table, like how in Stata the keep or drop commands select specific columns of your data, or how generate or replace create or modify columns in your data.

by, the third component, gives the variable(s) designating groups that you'll be doing your calculations within, like how in Stata you can precede a command with bysort.

data.table uses these three simple components more flexibly than Stata does, and you can get quite a lot out of data.table using only these three components that would require multiple other commands in Stata to accomplish. But even if you aren't doing anything fancy, data.table offers considerable speed gains over Stata (and many other R data-manipulation packages, for that matter).

grantmcdermott commented 2 years ago

Closing in favour of https://github.com/stata2r/stata2r.github.io/issues/7

stata2r / stata2r.github.io

Introduction to each package before cheatsheet #3