mjockers / syuzhet

An R package for the extraction of sentiment and sentiment-based plot arcs from text
334 stars 72 forks source link

Error in get_nrc_sentiment for Hungarian language #46

Closed balazsjonas closed 10 months ago

balazsjonas commented 10 months ago

I try to get the sentiment of Hungarian words. It works properly in English or Italian but in Hungarian.

packageVersion('syuzhet')
[1] ‘1.0.7’
syuzhet::get_nrc_sentiment(c('love', 'hate', 'apple'))

  anger anticipation disgust fear joy sadness surprise trust negative positive
1     0            0       0    0   1       0        0     0        0        1
2     1            0       1    1   0       1        0     0        1        0
3     0            0       0    0   0       0        0     0        0        0
syuzhet::get_nrc_sentiment(c('love', 'hate', 'apple'), language='english')
  anger anticipation disgust fear joy sadness surprise trust negative positive
1     0            0       0    0   1       0        0     0        0        1
2     1            0       1    1   0       1        0     0        1        0
3     0            0       0    0   0       0        0     0        0        0
syuzhet::get_nrc_sentiment(c('amore', 'odio', 'mela'), language='italian')
  anger anticipation disgust fear joy sadness surprise trust negative positive
1     0            0       0    0   0       0        0     0        0        0
2     2            0       2    2   0       2        0     0        2        0
3     0            0       0    0   0       0        0     0        0        0
syuzhet::get_nrc_sentiment(c('love', 'hate', 'apple'), language='hungarian')
#Error in value[[jvseq[[jjj]]]] : subscript out of bounds
syuzhet::get_nrc_sentiment(c('szeretet', 'utálat', 'alma'), language='hungarian')
#Error in value[[jvseq[[jjj]]]] : subscript out of bounds

Related stackoverflow question: https://stackoverflow.com/questions/77631640/r-package-syuzhet-does-not-work-in-hungarian

mjockers commented 10 months ago

?get_dct_transform "....At the time of this release, Syuzhet will only work with languages that use Latin character sets. This effectively means that "Arabic", "Bengali", "Chinese_simplified", "Chinese_traditional", "Greek", "Gujarati", "Hebrew", "Hindi", "Japanese", "Marathi", "Persian", "Russian", "Tamil", "Telugu", "Thai", "Ukranian", "Urdu", "Yiddish" are not supported even though these languages are part of the extended NRC dictionary."