Closed Yvonne-Han closed 4 years ago
OK. I think u'\u2019'
may be just one of the curly versions. So you may need to inspect the data more closely. Do you have a sample "utterance" (file_name
, speaker_number
, etc.) that's causing problems?
OK. I think u'\u2019' may be just one of the curly versions. So you may need to inspect the data more closely.
I see. I will take a closer look at this.
Do you have a sample "utterance" (file_name, speaker_number, etc.) that's causing problems?
I don't have it at the moment as I used some other texts to detect the differences. I managed to find the UNICODE documentation of general punctuations so just let me quickly go through everything that looks similar to an apostrophe.
Punctuation that looks like apostrophes:
This should be everything related to apostrophes (hopefully) so I'm closing this now.
To close this properly, you should have some (previously) problematic text where the issue is (or appears to be) curly apostrophes and then confirm that now the LIWC software and our Python function gives the same results. Do you have a sample of problematic text? Or is it just these .txt
files?
To close this properly, you should have some (previously) problematic text where the issue is (or appears to be) curly apostrophes and then confirm that now the LIWC software and our Python function gives the same results.
Yes I did compare the results of LIWC software and my code (in notebook) and confirmed that they are the same now. (I’m sorry that I forgot to post the notebook here...)
Do you have a sample of problematic text? Or is it just these .txt files?
No it’s just these .txt files (with file names indicating the type of punctuation being tested).
But I did tried a couple of randomly selected con call texts and compared the results of LIWC software and my code. The differences in those affected categories either became smaller or went away, so I think we are now one step closer.
I have to deal with apostrophes again as I believe it is causing some (or maybe all) of the differences between
liwc_orig
andliwc_alt
._Originally posted by @iangow in https://github.com/iangow/se_features/issues/18#issuecomment-517072302_
https://github.com/iangow/se_features/blob/32e2db2660f506265c890fd7286ae06a9371b864/liwc_2015/liwc_functions.py#L39
_Originally posted by @iangow in https://github.com/iangow/se_features/issues/18#issuecomment-517086556_