twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.66k stars 2.72k forks source link

Translation Issues #1246

Open vivly opened 3 years ago

vivly commented 3 years ago

Description of Issue

When I use the translation function, it throw an attribute error: "NoneType" object has no attribute "group", while I deletes "c.Translation = True", everything is ok.

Console

Traceback (most recent call last): File "E:\WorkSpace\Python\twint\twint\run.py", line 410, in Search run(config, callback) File "E:\WorkSpace\Python\twint\twint\run.py", line 329, in run get_event_loop().run_until_complete(Twint(config).main(callback)) File "D:\Coding\Anaconda\envs\Twint\lib\asyncio\base_events.py", line 488, in run_until_complete return future.result() File "E:\WorkSpace\Python\twint\twint\run.py", line 235, in main await task File "E:\WorkSpace\Python\twint\twint\run.py", line 262, in run await self.tweets() File "E:\WorkSpace\Python\twint\twint\run.py", line 226, in tweets await output.Tweets(tweet, self.config, self.conn) File "E:\WorkSpace\Python\twint\twint\output.py", line 166, in Tweets await checkData(tweets, config, conn) File "E:\WorkSpace\Python\twint\twint\output.py", line 131, in checkData tweet = Tweet(tweet, config) File "E:\WorkSpace\Python\twint\twint\tweet.py", line 158, in Tweet ts = translator.translate(text=t.tweet, dest=config.TranslateDest) File "D:\Coding\Anaconda\envs\Twint\lib\site-packages\googletransx\client.py", line 182, in translate data = self._translate(text, dest, src, kwargs) File "D:\Coding\Anaconda\envs\Twint\lib\site-packages\googletransx\client.py", line 78, in _translate token = self.token_acquirer.do(text) File "D:\Coding\Anaconda\envs\Twint\lib\site-packages\googletransx\gtoken.py", line 194, in do self._update() File "D:\Coding\Anaconda\envs\Twint\lib\site-packages\googletransx\gtoken.py", line 62, in _update code = self.RE_TKK.search(r.text).group(1).replace('var ', '') AttributeError: 'NoneType' object has no attribute 'group'

My Config is

c = twint.Config() c.Since = "2020-2-9" c.Until = "2020-2-19" c.Near = "London" c.Lang = "en" c.Store_csv = True c.Translate = True c.TranslateSrc = "en" c.TranslateDest = "it" c.Output = ".//test.csv"

Environment Details

Windows10

lordpeter003 commented 2 years ago

It looks like this problem has something to do with py-googletrans, and I am not sure why the required google translation library was changed from googletrans to googletransx a while ago for twint, but here is a temporary fix:

  1. Install googletrans pip install googletrans==4.0.0rc1, only this version of googletrans is working without the error.
  2. Change the import from googletransx to googletrans in tweet.py.

At least for me, the translation works after these changes, but google translate seems to have a limit on how many translations you can request, so the number of tweets you can translate before it closes is not that many. Also when there is Emojis in the tweet, the translation will not work and will spit out NaN as the translation result.

rsobt commented 2 years ago

This seems to be a bug in googletrans, not in twint.

In my program, I was able to translate some tweets, but certain strings, such as tweets with only URLs, caused errors.

So, I think a tentative solution is to return a specific string as the result or ignore the tweet itself when the translation fails in try -exception.