losDaniel / Student-Voices

Analyze millions of teacher reviews from every English speaking country using natural language processing
1 stars 0 forks source link

Summary Statistics for Cleaned Dataset #2

Open losDaniel opened 4 years ago

losDaniel commented 4 years ago

For the final cleaned dataset (where reviews shorter that 100 or 150 characters have already been removed. Provide the following summary statistics:

losDaniel commented 4 years ago

Ok, I need to find the cleaned data and the rated data.

losDaniel commented 4 years ago

Don't even worry about the exact cleaned docs, only apply the 100 (or 150) character limit. We want to find which percentage of the raw reviews were still insulting if they were longer. Lets find that proportion.

losDaniel commented 4 years ago

The old versions of project are not loading properly. I might have to download the whole thing from dropbox but I doubt that would make a difference. It might just be the checkpoints, I might have to move them from the archive to the main dir.

losDaniel commented 4 years ago

I also need the ids of the cleaned docs for all the rest of the fucking reviews by the way. ARG

losDaniel commented 4 years ago

It doesn't look like we can get gender from the teacher descriptions. It may just have to be from the reviews (he or she and shit like that).

losDaniel commented 4 years ago

Identified the sample reviews. As easy as reducing the sample by the ratings and then restricting review length to 100 characters.

losDaniel commented 4 years ago

image