Open losDaniel opened 4 years ago
Ok, I need to find the cleaned data and the rated data.
Don't even worry about the exact cleaned docs, only apply the 100 (or 150) character limit. We want to find which percentage of the raw reviews were still insulting if they were longer. Lets find that proportion.
The old versions of project are not loading properly. I might have to download the whole thing from dropbox but I doubt that would make a difference. It might just be the checkpoints, I might have to move them from the archive to the main dir.
I also need the ids of the cleaned docs for all the rest of the fucking reviews by the way. ARG
It doesn't look like we can get gender from the teacher descriptions. It may just have to be from the reviews (he or she and shit like that).
Identified the sample reviews. As easy as reducing the sample by the ratings and then restricting review length to 100 characters.
For the final cleaned dataset (where reviews shorter that 100 or 150 characters have already been removed. Provide the following summary statistics: