tomzhang255 / CCR

An R package for NLP
Other
0 stars 4 forks source link

new issues--August 22nd, 2022 #3

Closed m-atari closed 2 years ago

m-atari commented 2 years ago
  1. while the model is running, the user gets the following: "Downloading: 100%|██████████| 1.18k/1.18k [00:00<00:00, 123kB/s] Downloading: 100%|██████████| 190/190 [00:00<00:00, 36.1kB/s] ..." These are not self-explanatory. Explain to the user what exactly is being downloaded...

  2. Incorrect identification of language: when in the questionnaire file, there are NAs, the user gets this warning: "1: In encode_column(model, q_file, q_col, "q") : Non-English language detected in column Individualism from Desktop/CCR/CCR Experiment/CCR experiment measures.xlsx . Languages detected by cld3: en, NA" --NA should not be detected as a language of course. Please resolve.

  3. Another problem with language detection: I inserted a fully English data and got this warning: "In encode_column(model, data_file, data_col, "d") : Non-English language detected in column texts from Desktop/CCR/data_test.csv . Languages detected by cld3: en, NA, pt, ht " . Maybe we should use a different package for lang. detection?

tomzhang255 commented 2 years ago

"3": Yes, I just realized cld3 has a better function; this one actually gives probabilities; so we can keep only the reliable language predictions.