questions regarding repeated commas

Hi,

thanks for the awesome package, which I'm currently using to analyze YouTube comments. As you surely know, social media data often does not contain very clean and grammatical correct text. Many of the millions of comments I'm analyzing look like this:

',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, blah'

'i mean no disrespect bigenosoo1,,,,, nictgranz, man , you are very angery person,, dude relax,,, get a girlfriend,,'

and contain a lot of repeated commas. Such comments receive very high / low sentiment scores (with sentimentr version 2.3.1). I guess this is not intended, because for instance the 2 comments above received a way more negative sentiment than the following ones:

'GO FUCK YOURSELF YOU ARROGANT PRICK GO FUCK YOURSELF YOU ARROGANT PRICK GO FUCK YOURSELF YOU ARROGANT PRICK'

'But that's wrong you fucking retard.'

Why is this the case? And do you suggest users to clean up things like repeated commas before using your package? Maybe this can also somehow be handled by the algorithm without additional preprocessing.

trinker / sentimentr

questions regarding repeated commas #83