Attempting a list of chapters/topics and where vis's will fit in.
0. Setup and tools
1. Tokenizing
what it is, various options for defining it
maybe introduce idea of breaking up a text by other things like chapters here?
count displays of words, sentences (simple bar charts lynn can hack together)
2. Tokens in Context in Docs (might be collapsible into Ch.1 above, if that isn't meaty enough)
searching
Concordance
KWIC
Collocates/N-grams with search term (requires command-line preprocessing unless we hack something together; may need to do that for the sake of dynamic search)
word tree vis
3. Counting All Words in an Entire Doc
show basic counts (maybe using wordclouds or bar chart or both)
introduce need for stop word lists
introduce value of using n-grams with high frequency as a single word here? [we might need a "go list" along with a stop word list]; could be a future commandline/api improvement
lemmatizing option and resulting counts
4. Comparing Documents With Word Counts
word clouds that allow comparison of doc vocabs
POS option, tf-idf
(noun phrases? negation? could be useful for referring to in 5.)
5. Words Over Time in Docs
timeseries displays
simple sentiment analysis ?
counts of single words and classes of words over a document length
Attempting a list of chapters/topics and where vis's will fit in.
0. Setup and tools
1. Tokenizing
2. Tokens in Context in Docs (might be collapsible into Ch.1 above, if that isn't meaty enough)
3. Counting All Words in an Entire Doc
4. Comparing Documents With Word Counts
5. Words Over Time in Docs
6. Clustering Docs