kbenoit / quanteda.dictionaries

Dictionaries for text analysis
Other
71 stars 13 forks source link

CORPUS IN CATALAN #28

Closed LRiver15 closed 5 years ago

LRiver15 commented 5 years ago

Hello!

I am working with a corpus in Catalan and I would like to use QUANTEDA. I have tried with other corpus in English and in Spanish and it there was not any problem. However, I would like to know it is recommended in this case because of the constraints. For example, there is not any dictionary available for downloading.

Thank you in advance.

kbenoit commented 5 years ago

It should work fine. Stopwords are available in Catalan:

> stopwords::stopwords("ca", source = "misc") %>%
+     head(20)
 [1] "a"            "abans"        "abans-d'ahir" "abintestat"   "ací"          "adesiara"    
 [7] "adés"         "adéu"         "adàgio"       "ah"           "ahir"         "ai"          
[13] "aitambé"      "aitampoc"     "aitan"        "aitant"       "aitantost"    "aixà"        
[19] "això"         "així"        
> stopwords::stopwords("ca", source = "stopwords-iso") %>%
+     head(20)
 [1] "a"         "abans"     "ací"       "ah"        "així"      "això"      "al"        "aleshores"
 [9] "algun"     "alguna"    "algunes"   "alguns"    "alhora"    "allà"      "allí"      "allò"     
[17] "als"       "altra"     "altre"     "altres"   

Stemming in Catalan is not provided in quanteda.

LRiver15 commented 5 years ago

Thank you for your rapid and efficient response. I would greatly appreciate your help! Regards.