twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.75k stars 2.72k forks source link

"url" issue on Lookup #1003

Open marcoferre opened 3 years ago

marcoferre commented 3 years ago

Command Ran

for t in tweet_list:
  b = twint.Config()
  b.Username = t['username']
  b.Store_object = True

  twint.run.Lookup(b)

  user = twint.output.users_list[-1]
  t.update(user.__dict__)

  print(t)
  time.sleep(1)

It runs 3 times than this exception

CRITICAL:root:twint.get:User:'url'
ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred.
Traceback (most recent call last):
File "/content/src/twint/twint/run.py", line 307, in Lookup
await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
File "/content/src/twint/twint/get.py", line 228, in User
await Users(j_r, config, conn)
File "/content/src/twint/twint/output.py", line 177, in Users
user = User(u)
File "/content/src/twint/twint/user.py", line 31, in User
_usr.url = ur['data']['user']['legacy']['url']
KeyError: 'url'

Environment Details

Google Colab, Win

ytaijp commented 3 years ago

same here.

success with

c.Username = "twitter" twint.run.Lookup(c)

failed: c.Username = "jack" twint.run.Lookup(c)

aabid0193 commented 3 years ago

same issue here

seonake commented 3 years ago

It is not 'url' problem... it is 'data', but may be related.

CRITICAL:root:twint.get:User:'data' ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred. Traceback (most recent call last): File "/root/.local/lib/python3.6/site-packages/twint/run.py", line 307, in Lookup await get.User(self.config.Username, self.config, db.Conn(self.config.Database)) File "/root/.local/lib/python3.6/site-packages/twint/get.py", line 228, in User await Users(j_r, config, conn) File "/root/.local/lib/python3.6/site-packages/twint/output.py", line 177, in Users user = User(u) File "/root/.local/lib/python3.6/site-packages/twint/user.py", line 21, in User if 'data' not in ur and 'user' not in ur['data']: KeyError: 'data'


KeyError Traceback (most recent call last)

in () 31 c.Username = cuenta 32 ---> 33 twint.run.Lookup(c) 34 df = twint.storage.panda.User_df 35 11 frames /root/.local/lib/python3.6/site-packages/twint/user.py in User(ur) 19 def User(ur): 20 logme.debug(__name__ + ':User') ---> 21 if 'data' not in ur and 'user' not in ur['data']: 22 msg = 'malformed json! cannot be parsed to get user data' 23 logme.fatal(msg) KeyError: 'data'
MrNullPoint commented 3 years ago

I think it because json parse error, sometimes when we query a user such as @jack, twitter return something like .... user is suspend... (a json can be parsed), so I changed twint/user.py some code, add try...except..., after reinstall twint, this problem is solved. some code here:

    try:
        _usr.name = ur['data']['user']['legacy']['name']
    except:
        _usr.name = ''
    try:
        _usr.username = ur['data']['user']['legacy']['screen_name']
    except:
        _usr.username = ''
    try:
        _usr.bio = ur['data']['user']['legacy']['description']
    except:
        _usr.bio = ''
    try:
        _usr.location = ur['data']['user']['legacy']['location']
    except:
        _usr.location = ''
    try: 
        _usr.url = ur['data']['user']['legacy']['url']
    except:
        _usr.url = ''
himanshudabas commented 3 years ago

@MrNullPoint this has already been fixed here. Also try except won't be the desired solution for this, because if something breaks, we should be able to diagnose that issue. try except would simply suppress the issue, which would be much worse, as the final scraped dataset in an insonsistent state.

ghost commented 3 years ago

@himanshudabas -- I keep getting the error when I run twint -u USERNAME --user-full (i musing last kubuntu)

CRITICAL:root:twint.get:User:'url' ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred. Traceback (most recent call last): File "/home/mik/src/twint/twint/run.py", line 307, in Lookup await get.User(self.config.Username, self.config, db.Conn(self.config.Database)) File "/home/mik/src/twint/twint/get.py", line 228, in User await Users(j_r, config, conn) File "/home/mik/src/twint/twint/output.py", line 177, in Users user = User(u) File "/home/mik/src/twint/twint/user.py", line 31, in User _usr.url = ur['data']['user']['legacy']['url'] KeyError: 'url' Traceback (most recent call last): File "/home/mik/.local/bin/twint", line 11, in load_entry_point('twint', 'console_scripts', 'twint')() File "/home/mik/src/twint/twint/cli.py", line 339, in run_as_command main() File "/home/mik/src/twint/twint/cli.py", line 326, in main run.Lookup(c) File "/home/mik/src/twint/twint/run.py", line 386, in Lookup run(config) File "/home/mik/src/twint/twint/run.py", line 329, in run get_event_loop().run_until_complete(Twint(config).main(callback)) File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/home/mik/src/twint/twint/run.py", line 235, in main await task File "/home/mik/src/twint/twint/run.py", line 270, in run await self.Lookup() File "/home/mik/src/twint/twint/run.py", line 307, in Lookup await get.User(self.config.Username, self.config, db.Conn(self.config.Database)) File "/home/mik/src/twint/twint/get.py", line 228, in User await Users(j_r, config, conn) File "/home/mik/src/twint/twint/output.py", line 177, in Users user = User(u) File "/home/mik/src/twint/twint/user.py", line 31, in User _usr.url = ur['data']['user']['legacy']['url'] KeyError: 'url'

himanshudabas commented 3 years ago

@micaelamaria My patch hasn't been merged to the master yet. If you need to use twint urgently, you can install directly from my branch. Although I must warn you that there will be some other issues in this branch. If you do experience some other issue in the above branch, try installing from this branch. This branch is still a work in progress, but it'll be much more stable.

ghost commented 3 years ago

@himanshudabas - how do I install the package from your branch using the command line? I tried infinite options, and none seems to be working :(

himanshudabas commented 3 years ago

@micaelamaria

Try this :

pip3 install --user --upgrade git+https://github.com/himanshudabas/twint.git@origin/twint-fixes#egg=twint
PinchOfData commented 3 years ago

Hi guys, I think @@MrNullPoint is right to point out that the problem exists for other keys, too. Perhaps apply @himanshudabas solution to all keys?

JesusCoyotzi commented 3 years ago

Same issue from CLI:

twint -u jack --user-full CRITICAL:root:twint.get:User:'url' ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred. Traceback (most recent call last): File "/home/jesus/.local/lib/python3.6/site-packages/twint/run.py", line 307, in Lookup await get.User(self.config.Username, self.config, db.Conn(self.config.Database)) File "/home/jesus/.local/lib/python3.6/site-packages/twint/get.py", line 228, in User await Users(j_r, config, conn) File "/home/jesus/.local/lib/python3.6/site-packages/twint/output.py", line 177, in Users user = User(u) File "/home/jesus/.local/lib/python3.6/site-packages/twint/user.py", line 31, in User _usr.url = ur['data']['user']['legacy']['url'] KeyError: 'url' Traceback (most recent call last): File "/home/jesus/.local/bin/twint", line 8, in sys.exit(run_as_command()) File "/home/jesus/.local/lib/python3.6/site-packages/twint/cli.py", line 339, in run_as_command main() File "/home/jesus/.local/lib/python3.6/site-packages/twint/cli.py", line 326, in main run.Lookup(c) File "/home/jesus/.local/lib/python3.6/site-packages/twint/run.py", line 386, in Lookup run(config) File "/home/jesus/.local/lib/python3.6/site-packages/twint/run.py", line 329, in run get_event_loop().run_until_complete(Twint(config).main(callback)) File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete return future.result() File "/home/jesus/.local/lib/python3.6/site-packages/twint/run.py", line 235, in main await task File "/home/jesus/.local/lib/python3.6/site-packages/twint/run.py", line 270, in run await self.Lookup() File "/home/jesus/.local/lib/python3.6/site-packages/twint/run.py", line 307, in Lookup await get.User(self.config.Username, self.config, db.Conn(self.config.Database)) File "/home/jesus/.local/lib/python3.6/site-packages/twint/get.py", line 228, in User await Users(j_r, config, conn) File "/home/jesus/.local/lib/python3.6/site-packages/twint/output.py", line 177, in Users user = User(u) File "/home/jesus/.local/lib/python3.6/site-packages/twint/user.py", line 31, in User _usr.url = ur['data']['user']['legacy']['url'] KeyError: 'url'

edmangog commented 3 years ago

Thank you, @himanshudabas ! I have installed twint from you branch which are able to scrape user's profile without having the KeyError. However, when I execute the profile scraping in a loop around thousands times, there's a connection error freezes the loop(but not break out of the loop), which seems the tor connection had been disrupted. I wonder how could we fix this?

Error:

Exception in thread RecvLoop_95.216.: Traceback (most recent call last): File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\threading.py", line 932, in _bootstrap_inner self.run() File "C:\Users\user\AppData\Roaming\Python\Python38\site-packages\torpy\circuit.py", line 233, in run callback(key.fileobj, mask) File "C:\Users\user\AppData\Roaming\Python\Python38\site-packages\torpy\circuit.py", line 220, in _do_recv for cell in self._tor_socket.recv_cell_async(): File "C:\Users\user\AppData\Roaming\Python\Python38\site-packages\torpy\cell_socket.py", line 104, in recv_cell_async more_data = self._socket.recv(TorCellSocket.RECV_BUFF_SIZE) File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1226, in recv return self.read(buflen) File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1101, in read return self._sslobj.read(len) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

Here's my code: (Same Issue happens in using threading or not)

df = pd.read_csv('file.csv', index_col=0, header=0, encoding='utf-8-sig')
users= df['screen_name'].to_list()

def get_user_info(user):
    try:
        query = twint.Config()
        query.Username = user
        query.Output = user+".csv"
        query.Store_csv = True
        twint.run.Lookup(query)
    except:
        pass

ThreadPool().map(get_user_info, users)
Kayden-lolasery commented 3 years ago

still facing the issue

Natata commented 3 years ago

Thanks @vassef, this PR can fix the issue https://github.com/twintproject/twint/pull/1255

Ovid commented 2 years ago

I'm also getting this issue, Is anyone working on it? I see that https://github.com/twintproject/twint/pull/1255 fixes the issue, but at the cost of silently ignoring the error.

I might add that my examples uses jack, but I've other usernames (not sharing due to privacy—how ironic) that hit this issue too.

$ twint -u jack --user-full
CRITICAL:root:twint.get:User:'url'
ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred.
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 307, in Lookup
    await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
  File "/opt/homebrew/lib/python3.9/site-packages/twint/get.py", line 228, in User
    await Users(j_r, config, conn)
  File "/opt/homebrew/lib/python3.9/site-packages/twint/output.py", line 177, in Users
    user = User(u)
  File "/opt/homebrew/lib/python3.9/site-packages/twint/user.py", line 31, in User
    _usr.url = ur['data']['user']['legacy']['url']
KeyError: 'url'
Traceback (most recent call last):
  File "/opt/homebrew/bin/twint", line 8, in <module>
    sys.exit(run_as_command())
  File "/opt/homebrew/lib/python3.9/site-packages/twint/cli.py", line 339, in run_as_command
    main()
  File "/opt/homebrew/lib/python3.9/site-packages/twint/cli.py", line 326, in main
    run.Lookup(c)
  File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 386, in Lookup
    run(config)
  File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 329, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "/opt/homebrew/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 235, in main
    await task
  File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 270, in run
    await self.Lookup()
  File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 307, in Lookup
    await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
  File "/opt/homebrew/lib/python3.9/site-packages/twint/get.py", line 228, in User
    await Users(j_r, config, conn)
  File "/opt/homebrew/lib/python3.9/site-packages/twint/output.py", line 177, in Users
    user = User(u)
  File "/opt/homebrew/lib/python3.9/site-packages/twint/user.py", line 31, in User
    _usr.url = ur['data']['user']['legacy']['url']