Closed marcdotson closed 2 years ago
@marcdotson Dashboard changes were pushed, I also did some 11th hour data cleaning. A lot of the transcripts had these copywrite blurbs at the end, so I wrote some code in 02_tokenizing
to get rid of it. I added quite a bit of commentary to explain what I did which can be deleted/altered as you see fit. Ultimately, I don't think it will have much impact on what we're doing now, though I imagine it would goof up word embeddings somewhat.
Closed with #8.
There are a number of things we can consider trying. Use the
inital-eda
branch and discuss what works here. Some places to start: