iangow / se_features

Linguistic features derived from StreetEvents
1 stars 3 forks source link

Compare liwc_orig and liwc_alt on a random sample (n=50) #37

Open Yvonne-Han opened 4 years ago

Yvonne-Han commented 4 years ago

@iangow I've created a new issue for the code and results related to comparing liwc_orig and liwc_alt on a randomly selected n=50 sample utterances of speaker_data.

Yvonne-Han commented 4 years ago

@iangow Let's talk more about this tmr during zoom/skype meeting. But here's a brief summary of what I've done:

I've uploaded a notebook (see here) that did the following:

Yvonne-Han commented 4 years ago

Among all 50 files * 73 categories/file = 3650 categories, liwc_alt differs from liwc_orig in 52 categories (which is 52/3650 = 1.4%), so I guess it's not too bad? 🤦

Also - I took a look at the difference/total_word_count results (i.e., here)- and it seems that the largest error we get for a category is ~2%, with most of them being <1%.