programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons
http://programminghistorian.github.io/ph-submissions
136 stars 111 forks source link

Sentiment Analysis with 'syuzhet' using R #478

Closed RolRodr closed 1 year ago

RolRodr commented 2 years ago

The Programming Historian has received the following tutorial on 'Sentiment Analysis with 'syuzhet' using R' translated by @acrymble. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/en/drafts/translations/sentiment-analysis-syuzhet

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which will be held here on this forum.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me.

Our dedicated Ombudsperson is (Ian Milligan - http://programminghistorian.org/en/project-team). Please feel free to contact him at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudsperson will have no impact on the outcome of any peer review.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. Thank you for helping us to create a safe space.


RolRodr commented 2 years ago

Update: @JoshuaGOB and I will be meeting on Friday (May 13th) regarding this tutorial

anisa-hawes commented 2 years ago

Hello @RolRodr,

I hope you're well?

I noticed yesterday that the images aren't displaying correctly in the Preview. I've been trying to find out what is causing the problem, but I haven't resolved it yet... I have triple-checked the liquid syntax, the file names and the directory folders, and everything is perfect so it is a mystery! I will update you & Joshua here ASAP.

Thank you for your patience.

Anisa

Update

Done! In turns out that translation images are to be located in the 'original' images folder, in this case: images/analisis-de-sentimientos-r. I have moved them there, and the Lesson Preview now displays correctly!

Let's catch up again soon. A.

anisa-hawes commented 2 years ago

Hello @RolRodr. I've added html syntax for link within the Alert Warning box following para.89.

RolRodr commented 2 years ago

Hi @acrymble & @anisa-hawes, Below are my initial notes. Please, let me know if you have any questions or concerns!

General Notes:

Minor Changes

Lesson Objectives

P 1

P 2

P4

Background Information

P8

P9

P 10:

P 11:

P12:

P14:

P15:

P18:

P19:

P20a:

P20b:

P20d:

P20e:

P21:

A Brief Example

P23:

P24-27:

P28:

P29:

P30:

P31:

Appropriate Research Questions:

P34:

P35:

Obtaining Sentiment and Emotion Scores

P36:

Install and Load Relevant R Packages

P37:

P38:

Load and Prepare the Text

P39:

P40:

P41:

P44:

P46:

P47:

Extracting Data with the NRC Sentiment Lexicon

P48:

Summary of the Text

P50:

P51:

Interpreting the Results

P52:

Bar Chart By Emotion

P53:

P54:

Counting Words by Emotion

P59:

Code Example Box:

P60:

Code Example Box:

P62:

P69:

P77:

P83:

Loading your own Sentiment Lexicon

P86:

P87:

Warning Box after P89:

RolRodr commented 2 years ago

I have also begun reaching out to potential reviewers. In this regard, I am hoping that one reviewer is fluent or at least familiar with Spanish and another without knowledge of Spanish, as the perspectives from those two kinds of reviewers would be rather beneficial for this and future translations—I think.

acrymble commented 2 years ago

Thanks @RolRodr did you want me to make these changes after you've had peer reviewers read it and comment?

RolRodr commented 2 years ago

@acrymble Hi, Adam! I think you could make them before the peer reviewers' feedback—if time allows—as I haven't heard back from folks yet. Thanks for checking!

acrymble commented 2 years ago

@RolRodr thanks, I've fixed the ones I could find. There were a few I wasn't able to locate given the context.

Are P21-22 and Table 1 needed for the English translation?

On this suggestion, I think I'd be inclined to leave the translation table in to highlight the fact that Jenn had to do so in the original for Spanish readers and that we all need to keep in mind those higher barriers to coding in languages other than English.

RolRodr commented 2 years ago

Hi @acrymble ,

Thank you so much for your quick edits. I deeply apologize for the time it has taken me to continue on this thread – I went away and have been dealing with some health issues along other matters that all seemed to collide.

Since the remaining edits were all minor, I went ahead and made them to save you time. Here is the link to the commit, if you want to see the changes.

As another point of update: I've been reaching out to folks to be reviewers but have had little luck of receiving acceptances. I'll be reaching out to other folks on the English editor teams for recommendations and contacting some other folks in my network.

Thank you!

acrymble commented 2 years ago

Thank you for the update @RolRodr

I hope you are feeling better.

acrymble commented 1 year ago

Hi @RolRodr is there any update on this?

acrymble commented 1 year ago

@hawc2 I'm not able to reach the editor. Can you please let me know the status of this submission? It was submitted 16 months ago and still hasn't had a review.

hawc2 commented 1 year ago

@acrymble we're coincidentally meeting today and you should hear back shortly after that

hawc2 commented 1 year ago

@acrymble just to clarify, you said 16 months ago? I see the first issue posted in April of this year.

acrymble commented 1 year ago

@hawc2 yes I submitted this in June 2021. No ticket was opened until April 2022.

RolRodr commented 1 year ago

@acrymble I apologize for my lack of response; finding reviewers has been rather difficult. Fortunately, we now have two reviewers for this article! Shuang Du (@ReneeDu320) and Andrew Janco (@apjanco) have agreed to review this article. Thank you to @hawc2 for recommending that I reach out to @apjanco.

ReneeDu320 commented 1 year ago

Hello all,

I think the tutorial is precise and very helpful. I don't know Spanish at all but I found the tutorial is easy to follow and understand. The simple example for 'sentiment analysis using syuzhet' in the first section is a good start for audience who're not familiar with the 'sentiment analysis' concept. The following step by step case study on ‘Miau’ provides a clear pipeline for users.  I also found the visualization plots and their explanations helpful for understanding the quantitative results from `summary of the text`. Overall I think the tutorial has a clear framing and provides necessary information for first-time users to follow the lesson.

I have a few suggestions for minor changes:


barplot() paragraph 54 https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/barplot
comparison.cloud paragraph 75 https://www.rdocumentation.org/packages/wordcloud/versions/2.6/topics/comparison.cloud

I hope these feedback would be helpful, also thanks @RolRodr for inviting me to be the reviewer!

apjanco commented 1 year ago

Thank you for the opportunity to review this submission and translation. I have been a fan of PH's efforts to promote original content in Spanish, and it's great to see that content in translation for an Anglophone audience. The translator's notes do an excellent job of addressing context that readers may need.

Overall, this looks ready to publish in English. I have included my notes and thoughts below. They may be helpful when preparing the text and are related to the Spanish original and the English translation.

Arabic, Basque, Bengali, Catalan, Chinese_simplified, Chinese_traditional, Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Irish, Italian, Japanese, Latin, Marathi, Persian, Portuguese, Romanian, Russian, Somali, Spanish, Sudanese, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh, Yiddish, Zulu.

https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html

acrymble commented 1 year ago

@RolRodr I see we now have both reviewers responses. Is there anything you'd like me to take into consideration before I revise the text?

acrymble commented 1 year ago

Happy new year @RolRodr. Can you please provide some guidance on how you'd like me to proceed?

RolRodr commented 1 year ago

@acrymble Happy New Year, Adam. I apologize for missing your previous comment. I have looked over both of the reviewer's comments, and you should be all set to revise the text based on the reviewer's responses. I'll go over the text again once that has been done, and the tutorial should be able to move on to the next step after that. Thank you for your patience!

acrymble commented 1 year ago

@RolRodr thank you I will begin revising it soon. I have to teach a class using this tutorial in 3 weeks time so I don't want to change it while the students are using it. But I'll do that as soon as possible thereafter.

acrymble commented 1 year ago

In addition to the above comments, I've re-tried the lesson (as my students are using it later this week) and note the following issues to address:

acrymble commented 1 year ago

Line 94 - This section gives some history on Jocker's work and Annie Swafford's critique. Beyond noting the debate, a concise summary of the positions and points of contention would be helpful. As written, the reader needs to leave the tutorial to learn more or to assess the significance of this critique for their use of syuzhet.

I've beefed the examples up a bit, without going into all the detail because the links are there to the original texts: "This included concerns about incorrectly splitting sentences involving quotation marks, and problems with using a sentiment lexicon designed for modern English on a historic text that uses the same words in slightly different ways. Assigning concrete values of measurement to literary texts, which are by their nature quite subjective, is always challenging and potentially problematic. A series of archived blog entries by Jockers outline his thoughts on the method and address some of the criticisms about the degree to which sentiment can accurately be measured when sometimes even humans disagree on a passage of text's effects on the reader."

Line 106 - Mentions that syuzhet is not able to handle negation properly which is a significant limitation that has been addressed by NLP research. Richard Socher's 2013 dissertation offers a solution to the problem of contrastive conjunctions and negation. Since 2017, transformer models can account for context in sentiment prediction tasks. I am glad that this limitation is listed and will help readers. Syuzhet's approach is still a common method of detecting sentiment, but I would give those readers for whom this would be a serious problem a path forward.

I've added a reference to Socher's PhD. The transformer models won't work with this particular solution as far as I can tell, as they're machine-learning based, and this is lexicon based.

Line 275 - Discusses how syuzhet tokenizes the text. Syuzhet's get_tokens() function uses regex. A footnote on what regex considers a "word" character ("\W") would be helpful here. This seems to have the effect of dropping spaces and punctuation, which is a common tokenization method for some languages but is problematic for others.

I've added a couple of sentences and a link to Wikipedia to help explain the implications of this approach: "This approach to tokenisation uses regular expressions and is not always appropriate in all use cases. It will, for example, split hyphenated words into two. Depending on your text, you should consider the implications of your chosen method of tokenisation as you can use any method you like as long as the output is in the same format as in the example below."

The code to calculate sentiment scores now gives a deprecation warning that's linked to the syuzhet package itself. The code still works for now but it is throwing a warning: "Warning message: spread_() was deprecated in tidyr 1.2.0.

I've added a note in the text that this shouldn't affect the running of the code, and that fixing this error is out of the hands of all but Matthew Jockers.

acrymble commented 1 year ago

@RolRodr

I've now revised the text as best I can. The changes are listed in the above comments, including fixing a few issues that came up during my workshop with my students. The one I can't fix is the suggestion to make the Mac/Windows loading instructions consistent. This isn't my code and I'm not really a heavy Windows user so I wouldn't know how to fix it.

I'd like to note that I think there's an error in that original code:

text_string <- scan(file = "FILEPATH", fileEncoding = "UTF-8", what = character(), sep = "\n", allowEscapes = T)

When checked for number of sentences it returns this:

sentence_vector <- get_sentences(text_string) length(sentence_vector) [1] 13136

The result should be 6022 according to the code. When I use that same line on Mac it also returns 13136. The rest of the code runs fine, so I'm not sure of the impact of this. @jenniferisasi wrote the original (which has also been translated into PT) so she may be best placed to update that issue as a bug rather than a translation problem.

@jenniferisasi there is also now a code warning in the syuzhet package itself, which Matthew Jockers would have to fix (I don't know if you're in contact):

"Warning message: spread_() was deprecated in tidyr 1.2.0. ℹ Please use spread() instead. ℹ The deprecated feature was likely used in the syuzhet package. Please report the issue to the authors. This warning is displayed once every 8 hours. Call lifecycle::last_lifecycle_warnings() to see where this warning was generated. "

Apart from that bug in the original, I believe I've addressed all of the review comments and I pass it back to you @RolRodr

RolRodr commented 1 year ago

Thank you for the update, @acrymble. I will review the all the changes and give the lesson another close read-through before continuing. I will follow up soon!

jenniferisasi commented 1 year ago

Dear @acrymble and @RolRodr! Just a note to say that I've read your messages and will try to look at the code late next week or early the following one. And, yes, I can contact Jockers and see what that message is about.

acrymble commented 1 year ago

Hi @RolRodr do you need anything else from me on this translation? I'm going on leave soon so won't be able to do any more work for some time unless I get notice soon.

Thanks.

RolRodr commented 1 year ago

Hi @acrymble The updated lesson is looking good! I'll move it forward to the next steps. Thank you!

anisa-hawes commented 1 year ago

Thank you, @RolRodr!

We only copyedit original lessons, so I will move on to typesetting + generating perma.cc links.

anisa-hawes commented 1 year ago

Hello @hawc2 ,

This lesson is ready for your final review.

Sustainability + accessibility actions status:

@RolRodr :

hawc2 commented 1 year ago

@acrymble congrats, your translation is now published!: https://programminghistorian.org/en/lessons/sentiment-analysis-syuzhet

Note the DOI may not be live for a few days, as our contact is out of the office this week, and they gave me another address to contact.

Congrats to @jenniferisasi for having your lesson translated into English! And big thanks to @RolRodr for editing this lesson (his first lesson as a PH Editor no less!)! And gratitude to @apjanco and @ReneeDu320 for reviewing this translation!

anisa-hawes commented 1 year ago

Congratulations to all! ✨

I'll set myself a reminder to celebrate/promote via our social media channels next week.

acrymble commented 1 year ago

Thanks @RolRodr @hawc2 ! And thanks to @jenniferisasi for writing the great tutorial.