Open bbdaniels opened 2 years ago
Lecture 1
Stata does not distinguish between:
- One empty space and many empty spaces
- One line break or many line breaks
It makes a big difference to the human eye! We would never share:
- A Word document,
- An Excel sheet or
- A PowerPoint presentation
… without thinking about white space – there, we call it formatting
Lecture 2
Lecture 3
Raw
\
, but you should always use /
(reasons)”; second slide: change “Do you understand” to a breakdown — DRIVE; DIRECTORY; NAME; EXTENSION. Also, WHY DO WE USE QUOTES, clear
in use
commandsLab 1
_
or -
Lab 2
edit
and to never use the editor
shortcut?table
here but unfortunately the Stata 17 syntax and function are quite different; I would also suggest mean
along with summarize
as it is quite handydescribe
and codebook
can also take varlists; note missing
, plot
, and nolabel
option for tabulate
; I would add list [varlist] [if] [in]
here graph pie
- introduce simple options (ie graph bar, over(foreign) stack [asy]
). Add graph box
if missing(nr_participants)
?, discrete
in histogram
as we know this variable takes integer values (see the misleading gap between 6 and 7)labelbook
takes the LABEL name, not the VARIABLE name, and how to find/distinguish, missing
in the two-way tabulate
format()
in histogram
— histogram bid_submission_date , xlab(,format(%tdDD/NN/YY)) discrete
; use international format (not DMY, not US MDY); mention difference between DMY and MDY (ie, for date importing) Lab 3
using
heretag
number is an indicator for how many, not which; show egen x = group(var)
to identify duplicate groups?if
: Mention missing handling?export excel
)? This is easy and super useful for many peopleLab 4
tab , gen
to get the binary encodings if preferredmodify
or even use that in an intro course. Instead, prefer to re-define the entire label explicitly lab def , replace
recode
; generate a new variable and label it explicitly. recode var (X=Y “Label”) … , gen(var_clean)
destring
)graph box
is useful as it has an explicit outlier ruleorder
: Often use sequential
option — think about this when coming up with varnamesLab 5
^
datediff
is nice but a bit advanced. I would start with simpler stuff, like creation of Booleans or additional categoricals. At least introduce egen
first — and LABEL ALL NEW VARIABLES. Also, reverse the names for sorting: init_month
and init_quart
, etc instead (recall order , seq
)egen
- move before data handling. Label variables.expand
and weight
)?collapse
: Screenshot of codebook
doesn’t fit. Why not show count
or codebook, compact
before and after?collapse
: Show syntax such as collapse var (mean) var = var (min) var_min = var var2_min = var2 (count) var_count = var , by(catvar)
merge
: Note that at least one data set MUST be uniquely identified (1:m, m:1, 1:1). There is no m:m merge — but there is the expansion merge joinby
merge
: Note keepusing()
option; note that having the same variable (name) in both data sets can cause problemsmerge
: Show how to tab _merge
and then drop _merge
(or at least rename and relabel it if you want to keep it)iecodebook
for subsetting variables; show potential Booleans for subsetting observations?Lab 6
tabstat
- introduce idea of stored/accessible results using , save
; return list
; matlist
? “You can build highly customized reports by saving and exporting matrices and results usig built-in commands like putexcel
and putdocx
.” list varlist
: Note that the sort order should never matter and should be used to get desired resultsgraph bar
: There are lots more possible values of stat
!graph save
, graph combine
, graph export
, and putdocx
. What about tables here?, clear
in use
commandsgraph bar, stack
)Lab 7 and 8 are very short -- these can probably be cut? I might instead split Lab 6 into two or three sessions, depending how many are scheduled -- something like:
Lab 6: Creating graphics
Lab 7: Creating non-graphical outputs
Lab 8: Extensions and looking forward
General
:
should be capitalized