mjockers / syuzhet

An R package for the extraction of sentiment and sentiment-based plot arcs from text
333 stars 72 forks source link

score zero to any non-english sentence #26

Open hichemmkhalyd opened 7 years ago

hichemmkhalyd commented 7 years ago

I tried to use get_nrc_sentiment with lang = "portuguese" and NRC method but it aways returns zero for all sentiments. I was wondering if I shoud do something more to use it.

mjockers commented 7 years ago

This is the dev version on GitHub and we're still working to test all the various languages that are available via the expanded NRC lexicon. So far Spanish and German have been tested for functionality. Will keep this issue open until portuguese has been tested. Thanks.

FelixPeckitt commented 6 years ago

Hi Matthew - I'm happy to take up this issue, and continue testing in other languages. Is there a standardised way of running tests for this project that I should use? If not, I'll document my approach.

I'm very new to contributing to Open Source, so do let me know if there are things that I have missed / OSS etiquette. Thanks

mjockers commented 6 years ago

Thanks codellama23. If you have fluency/expertise in another language and would like to test it, that would be great. I don't have a standardized test (though that is a good idea), so the way we have been testing the output is by having native speakers process a fairly well known text (i.e. Don Quixote was used for Spanish and Kafka's Metamorphoses for German) and then scrutinize the results to see if the program's output aligns with your expectations of the sentiment as a native speaker of the language.

FelixPeckitt commented 6 years ago

Hi @hichemmkhalyd, I was unable to recreate your error. I used the text of Os Maias by Eça de Queirós.

os_maias_text.txt source https://www.gutenberg.org/ebooks/40409

Here is the code I used: library(syuzhet)

#use the path of the copy of the text you have on your own machine os_maias_path <- "/Users/Username/Documents/os_maias_text.txt"

os_maias_string <- get_text_as_string(os_maias_path) om_sentences <- get_sentences(os_maias_string)

#this may take a few minutes syuzhet_vector <- get_sentiment(om_sentences, method="nrc", lang="portuguese")

#name the sentiment score with the corresponding sentence names(syuzhet_vector) <- c(om_sentences)

#have a look at a few syuzhet_vector[100:105]

And here is the output I get:

O antepassado, cujos olhos se enchiam agora d'uma luz de ternura diante das suas rosas, e que ao canto do lume relia com gosto o seu Guisot, fôra, na opinião de seu pae, algum tempo, o mais feroz Jacobino de Portugal! 0 E todavia, o furor revolucionario do pobre moço consistira em lêr Rousseau, Volney, Helvetius, e a Encyclopedia; em atirar foguetes de lagrimas á Constituição; e ir, de chapeu á liberal e alta gravata azul, recitando pelas lojas maçonicas Odes abominaveis ao Supremo Architecto do Universo. 0 Isto, porém, bastára para indignar o pae. 0 Caetano da Maia era um portuguez antigo e fiel que se benzia ao nome de Robespierre, e que, na sua apathia de fidalgo beato e doente, tinha só um sentimento vivo--o horror, o odio ao Jacobino, aquem attribuia todos os males, os da patria e os seus, desde a perda das colonias até ás crises da sua gota. -7 Para extirpar da nação o Jacobino, déra elle o seu amor ao sr. infante D. 1 Miguel, Messias forte e Restaurador providencial... 0

Could you post the code you used that caused your error?

Thanks :)