micah5 / ace-attorney-reddit-bot

👨🏼‍⚖️ reddit bot that turns comment chains into ace attorney scenes
Do What The F*ck You Want To Public License
772 stars 54 forks source link

Added sentiment analysis for other languages #19

Closed LuisMayo closed 3 years ago

LuisMayo commented 3 years ago

This PR adds sentiment analysis capabilities to other languages other than english by performing the analysis over a translated version of the original text.

I'm well aware that Reddit speaks mainly in English, so this PR may not be useful. Please feel free to reject it. I had to type code anyway for my own purposes so I think you may be interested in it. As I said, feel free to reject the PR if you fell it doesn't add anything relevant.

Thanks

micah5 commented 3 years ago

Haven't tested this one but it looks good, thank you!

LuisMayo commented 3 years ago

There is an error where if you make loads of petitions it'll crash with a 429 HTTP error code

I advise against using this code in production and I'm sorry for the problems that may arise

I plan to make a fix during today

Check https://github.com/LuisMayo/ace-attorney-bot/tree/translation-error-hotfix for a hotfix

Thanks and sorry for the disturbances

micah5 commented 3 years ago

Ah cool, no worries. I added the hotfix, it's running smoothly. The other bots look great! Let me know if you'd like me to make an announcement on u/objection-bot that the bot is now available on other platforms

LuisMayo commented 3 years ago

An announcement would be great! I'm well aware people were asking it for discord at least so they may find the info useful.

As for the "Rate exceeded" I've found several possible solutions so I'll just state them here for the sake of discussion. In case you aren't fully aware of the problem, the problem is that by processing several messages the "detect_language" function starts crashing by being limited by the Google Translate API.

To fix this problem, several solutions may be used:

  1. Make the hotfix permanent: In this scenario, some messages will be properly translated, when we hit the rate limit we just go on without translation until the rate is reset.
  2. Give up on multiple languages support: Reddit probably doesn't need any language support besides English anyway, so it'll be as ease as reverting this PR
  3. Use https://github.com/ssut/py-googletrans: This projects works by using the Google Translate AJAX API which means it isn't rate-limited. The problem is that this API may be changed without further advice since it's not meant to be used outside translate.google.com. This is probably is one of the best options
  4. Use official Google Translate API: I'll probably do this, maybe in combination with py-googletrans as a fallback measure. This option isn't the best since it requires to set up an account in Google Cloud and extract the token.

Which option do you think will be the best for this project? Tell me so I open a PR with whatever you want to do (or do it yourself if you want to). For my project I'll probably use the Google Official API, fallbacking to py-googletrans then fallback to no translation since language support is more important in Twitter/Telegram/Discord than it is on Reddit so I prefer a more stable API.

Regards and thanks!

micah5 commented 3 years ago

Cool, I've created a post with links to each project. Let me know if there's any further info I should add.

For the reddit bot, I think making the hotfix permanent is probably the best solution. As you said, most of reddit is in english and by having the hotfix in place at least it will work on the non-english subs as traffic on the bot reduces over time.

LuisMayo commented 3 years ago

Thanks. The Discord link is this: https://discord.com/oauth2/authorize?client_id=806980920544460831&permissions=100352&scope=bot

Thank you again! I'll do a PR in a few hours probably