Open Yvonne-Han opened 4 years ago
@iangow Let's talk more about this tmr during zoom/skype meeting. But here's a brief summary of what I've done:
I've uploaded a notebook (see here) that did the following:
liwc_alt
results for this n=50 sampleliwc_orig
results for the same sampleliwc_alt
with liwc_orig
and store the results in .csv formats (see here for the raw difference and here for the percentage difference)Among all 50 files * 73 categories/file = 3650 categories, liwc_alt
differs from liwc_orig
in 52 categories (which is 52/3650 = 1.4%), so I guess it's not too bad? 🤦
Also - I took a look at the difference/total_word_count results (i.e., here)- and it seems that the largest error we get for a category is ~2%, with most of them being <1%.
@iangow I've created a new issue for the code and results related to comparing liwc_orig and liwc_alt on a randomly selected n=50 sample utterances of
speaker_data.