Open marcoferre opened 4 years ago
same here.
success with
c.Username = "twitter" twint.run.Lookup(c)
failed: c.Username = "jack" twint.run.Lookup(c)
same issue here
It is not 'url' problem... it is 'data', but may be related.
CRITICAL:root:twint.get:User:'data' ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred. Traceback (most recent call last): File "/root/.local/lib/python3.6/site-packages/twint/run.py", line 307, in Lookup await get.User(self.config.Username, self.config, db.Conn(self.config.Database)) File "/root/.local/lib/python3.6/site-packages/twint/get.py", line 228, in User await Users(j_r, config, conn) File "/root/.local/lib/python3.6/site-packages/twint/output.py", line 177, in Users user = User(u) File "/root/.local/lib/python3.6/site-packages/twint/user.py", line 21, in User if 'data' not in ur and 'user' not in ur['data']: KeyError: 'data'
KeyError Traceback (most recent call last)
I think it because json parse error, sometimes when we query a user such as @jack
, twitter return something like .... user is suspend...
(a json can be parsed), so I changed twint/user.py
some code, add try...except...
, after reinstall twint, this problem is solved.
some code here:
try:
_usr.name = ur['data']['user']['legacy']['name']
except:
_usr.name = ''
try:
_usr.username = ur['data']['user']['legacy']['screen_name']
except:
_usr.username = ''
try:
_usr.bio = ur['data']['user']['legacy']['description']
except:
_usr.bio = ''
try:
_usr.location = ur['data']['user']['legacy']['location']
except:
_usr.location = ''
try:
_usr.url = ur['data']['user']['legacy']['url']
except:
_usr.url = ''
@MrNullPoint this has already been fixed here. Also try except won't be the desired solution for this, because if something breaks, we should be able to diagnose that issue. try except would simply suppress the issue, which would be much worse, as the final scraped dataset in an insonsistent state.
@himanshudabas -- I keep getting the error when I run twint -u USERNAME --user-full (i musing last kubuntu)
CRITICAL:root:twint.get:User:'url'
ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred.
Traceback (most recent call last):
File "/home/mik/src/twint/twint/run.py", line 307, in Lookup
await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
File "/home/mik/src/twint/twint/get.py", line 228, in User
await Users(j_r, config, conn)
File "/home/mik/src/twint/twint/output.py", line 177, in Users
user = User(u)
File "/home/mik/src/twint/twint/user.py", line 31, in User
_usr.url = ur['data']['user']['legacy']['url']
KeyError: 'url'
Traceback (most recent call last):
File "/home/mik/.local/bin/twint", line 11, in
@micaelamaria My patch hasn't been merged to the master yet. If you need to use twint urgently, you can install directly from my branch. Although I must warn you that there will be some other issues in this branch. If you do experience some other issue in the above branch, try installing from this branch. This branch is still a work in progress, but it'll be much more stable.
@himanshudabas - how do I install the package from your branch using the command line? I tried infinite options, and none seems to be working :(
@micaelamaria
Try this :
pip3 install --user --upgrade git+https://github.com/himanshudabas/twint.git@origin/twint-fixes#egg=twint
Hi guys, I think @@MrNullPoint is right to point out that the problem exists for other keys, too. Perhaps apply @himanshudabas solution to all keys?
Same issue from CLI:
twint -u jack --user-full
CRITICAL:root:twint.get:User:'url'
ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred.
Traceback (most recent call last):
File "/home/jesus/.local/lib/python3.6/site-packages/twint/run.py", line 307, in Lookup
await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
File "/home/jesus/.local/lib/python3.6/site-packages/twint/get.py", line 228, in User
await Users(j_r, config, conn)
File "/home/jesus/.local/lib/python3.6/site-packages/twint/output.py", line 177, in Users
user = User(u)
File "/home/jesus/.local/lib/python3.6/site-packages/twint/user.py", line 31, in User
_usr.url = ur['data']['user']['legacy']['url']
KeyError: 'url'
Traceback (most recent call last):
File "/home/jesus/.local/bin/twint", line 8, in
Thank you, @himanshudabas ! I have installed twint from you branch which are able to scrape user's profile without having the KeyError. However, when I execute the profile scraping in a loop around thousands times, there's a connection error freezes the loop(but not break out of the loop), which seems the tor connection had been disrupted. I wonder how could we fix this?
Error:
Exception in thread RecvLoop_95.216.: Traceback (most recent call last): File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\threading.py", line 932, in _bootstrap_inner self.run() File "C:\Users\user\AppData\Roaming\Python\Python38\site-packages\torpy\circuit.py", line 233, in run callback(key.fileobj, mask) File "C:\Users\user\AppData\Roaming\Python\Python38\site-packages\torpy\circuit.py", line 220, in _do_recv for cell in self._tor_socket.recv_cell_async(): File "C:\Users\user\AppData\Roaming\Python\Python38\site-packages\torpy\cell_socket.py", line 104, in recv_cell_async more_data = self._socket.recv(TorCellSocket.RECV_BUFF_SIZE) File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1226, in recv return self.read(buflen) File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1101, in read return self._sslobj.read(len) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Here's my code: (Same Issue happens in using threading or not)
df = pd.read_csv('file.csv', index_col=0, header=0, encoding='utf-8-sig')
users= df['screen_name'].to_list()
def get_user_info(user):
try:
query = twint.Config()
query.Username = user
query.Output = user+".csv"
query.Store_csv = True
twint.run.Lookup(query)
except:
pass
ThreadPool().map(get_user_info, users)
still facing the issue
Thanks @vassef, this PR can fix the issue https://github.com/twintproject/twint/pull/1255
I'm also getting this issue, Is anyone working on it? I see that https://github.com/twintproject/twint/pull/1255 fixes the issue, but at the cost of silently ignoring the error.
I might add that my examples uses jack
, but I've other usernames (not sharing due to privacy—how ironic) that hit this issue too.
$ twint -u jack --user-full
CRITICAL:root:twint.get:User:'url'
ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred.
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 307, in Lookup
await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
File "/opt/homebrew/lib/python3.9/site-packages/twint/get.py", line 228, in User
await Users(j_r, config, conn)
File "/opt/homebrew/lib/python3.9/site-packages/twint/output.py", line 177, in Users
user = User(u)
File "/opt/homebrew/lib/python3.9/site-packages/twint/user.py", line 31, in User
_usr.url = ur['data']['user']['legacy']['url']
KeyError: 'url'
Traceback (most recent call last):
File "/opt/homebrew/bin/twint", line 8, in <module>
sys.exit(run_as_command())
File "/opt/homebrew/lib/python3.9/site-packages/twint/cli.py", line 339, in run_as_command
main()
File "/opt/homebrew/lib/python3.9/site-packages/twint/cli.py", line 326, in main
run.Lookup(c)
File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 386, in Lookup
run(config)
File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 329, in run
get_event_loop().run_until_complete(Twint(config).main(callback))
File "/opt/homebrew/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 235, in main
await task
File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 270, in run
await self.Lookup()
File "/opt/homebrew/lib/python3.9/site-packages/twint/run.py", line 307, in Lookup
await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
File "/opt/homebrew/lib/python3.9/site-packages/twint/get.py", line 228, in User
await Users(j_r, config, conn)
File "/opt/homebrew/lib/python3.9/site-packages/twint/output.py", line 177, in Users
user = User(u)
File "/opt/homebrew/lib/python3.9/site-packages/twint/user.py", line 31, in User
_usr.url = ur['data']['user']['legacy']['url']
pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
;Command Ran
Environment Details
Google Colab, Win