twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.65k stars 2.72k forks source link

Lookup user / 'profile_banner_url #1002

Open seonake opened 3 years ago

seonake commented 3 years ago

Hi, It seems there is still some problems with Lookup function. Am I doing something wrong?

Command Ran

twint.run.Lookup(c)

Description of Issue

I got this:

CRITICAL:root:twint.feed:Follow:IndexError CRITICAL:root:twint.feed:Follow:IndexError CRITICAL:root:twint.get:User:'profile_banner_url' ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred. Traceback (most recent call last): File "/content/src/twint/twint/run.py", line 307, in Lookup await get.User(self.config.Username, self.config, db.Conn(self.config.Database)) File "/content/src/twint/twint/get.py", line 228, in User await Users(j_r, config, conn) File "/content/src/twint/twint/output.py", line 177, in Users user = User(u) File "/content/src/twint/twint/user.py", line 49, in User _usr.background_image = ur['data']['user']['legacy']['profile_banner_url'] KeyError: 'profile_banner_url'


/content/src/twint/twint/user.py in User(ur) 47 _usr.is_verified = ur['data']['user']['legacy']['verified'] 48 _usr.avatar = ur['data']['user']['legacy']['profile_image_url_https'] ---> 49 _usr.background_image = ur['data']['user']['legacy']['profile_banner_url'] 50 # TODO : future implementation 51 # legacy_extended_profile is also available in some cases which can be used to get DOB of user

KeyError: 'profile_banner_url'

Environment Details

Mac / Google colab

himanshudabas commented 3 years ago

Would fix this today.

seonake commented 3 years ago

Many thanks.

himanshudabas commented 3 years ago

could you please share the exact script that you ran? so I can replicate this issue.

seonake commented 3 years ago

Now I got this: RefreshTokenException: Could not find the Guest token in HTML with twint.run.Lookup(c) an twint.run.Following(c)

seonake commented 3 years ago

/content/src/twint/twint/token.py in refresh(self) 66 else: 67 self.config.Guest_token = None ---> 68 raise RefreshTokenException('Could not find the Guest token in HTML')

RefreshTokenException: Could not find the Guest token in HTML

himanshudabas commented 3 years ago

Are you running this script on the same machine you were previously running on? Are you running it on anaconda or system wide python installation? Also go through this thread first #957

seonake commented 3 years ago

Yes, same script. Google Colab.

seonake commented 3 years ago

!pip install nest_asyncio !pip install --user --upgrade -e git+https://github.com/twintproject/twint.git@master#egg=twint


import twint import nest_asyncio import pandas as pd

nest_asyncio.apply() directorio = '/content/drive/My Drive/TFM/Cuentas/'

DataIn=pd.read_csv(directorio + 'cuentas_inicio.csv')

for fila in DataIn.itertuples(): config = twint.Config() config.Username=fila[2]

config.Store_object = True

config.Pandas = True config.Store_pandas = True config.Hide_output = True

twint.run.Following(config)

seonake commented 3 years ago

Again, same script... back to first error: CRITICAL:root:twint.get:User:'profile_banner_url'

But i have new information, since I am taking users from CSV file... it is working fine with some users, but i got this error for example with user = "Kronprinsparet" ... Maybe It helps...

himanshudabas commented 3 years ago

Again, same script... back to first error: CRITICAL:root:twint.get:User:'profile_banner_url'

But i have new information, since I am taking users from CSV file... it is working fine with some users, but i got this error for example with user = "Kronprinsparet" ... Maybe It helps...

I have fixed the profile_banner_url error in my branch. My PR for that patch hasn't been merged yet, meanwhile use the below command to install my fix from my brnach

pip3 install --user --upgrade git+https://github.com/himanshudabas/twint.git@origin/fix-parser#egg=twint

As for the Guest_token error, it happens when twitter blacklists your IP address for making too many requests within a short period of time.

You can confirm this by taking a break of 15 minutes when you get this error, run your script again after 15 minutes. After 15 minutes your script should be working again.

seonake commented 3 years ago

Many thanks!

seonake commented 3 years ago

Still not working, may I help?

CharleoY commented 3 years ago

Again, same script... back to first error: CRITICAL:root:twint.get:User:'profile_banner_url' But i have new information, since I am taking users from CSV file... it is working fine with some users, but i got this error for example with user = "Kronprinsparet" ... Maybe It helps...

I have fixed the profile_banner_url error in my branch. My PR for that patch hasn't been merged yet, meanwhile use the below command to install my fix from my brnach

pip3 install --user --upgrade git+https://github.com/himanshudabas/twint.git@origin/fix-parser#egg=twint

As for the Guest_token error, it happens when twitter blacklists your IP address for making too many requests within a short period of time.

You can confirm this by taking a break of 15 minutes when you get this error, run your script again after 15 minutes. After 15 minutes your script should be working again.

Thanks for your effort! But here I have a problem. What if I need to crawl just the info of 50k twitter users, so I need to send a lot request frequently. Is there any way to overcome it?

himanshudabas commented 3 years ago

@CharleoY The only way I can think of is, *Use a proxylist, when you recieve this Exception, simply rotate your proxy.

I don't know if proxies are working right now in the current implementation of twint.

But this is one of the ways to go.

Moreover for searching userdata, I am planning to add that feature soon, which would allow you to scrape data of around 100 users in 1 single api request. So you can get the details of 50,000 users in merely 500 requests compared to 50,000 requests that you'll need to make right now.

So your IP won't be blacklisted.

It'd take some time to implement though.

gautampal1947 commented 3 years ago

Is there a way to retrieve the video URLs in the Tweets?

himanshudabas commented 3 years ago

@gautampal1947 Videos on twitter doesn't have a url.

gautampal1947 commented 3 years ago

@gautampal1947 Videos on twitter doesn't have a url.

Seems video URL can be extracted from the embedded video in the iFrame: https://steemit.com/technology/@singhpratyush/fetching-url-for-embedded-twitter-videos

zhaojiafu commented 3 years ago

It'd take some time to implement though.

@CharleoY The only way I can think of is, *Use a proxylist, when you recieve this Exception, simply rotate your proxy.

I don't know if proxies are working right now in the current implementation of twint.

But this is one of the ways to go.

Moreover for searching userdata, I am planning to add that feature soon, which would allow you to scrape data of around 100 users in 1 single api request. So you can get the details of 50,000 users in merely 500 requests compared to 50,000 requests that you'll need to make right now.

So your IP won't be blacklisted.

It'd take some time to implement though.

hello,Do you support multiple users with one request now?

agombert commented 3 years ago

EDIT: no problem after all, I misunderstood

himanshudabas commented 3 years ago

@agombert can you elaborate a little bit on what you are trying to do here? because this error occurs when you don't provide a Username or User_id before calling Lookup.

I am new to twint so have no Idea what Members_list does. also it'd be nice if you could explain how 'manhack/OSINT' works. Moreover when older twitter endpoints were deprecated (which broke the library), a lot of code in twint changed to fix the library, and due to the lack of proper documentation I wasn't able to grasp how things worked before the library broke.

That's the reason a lot of stuff is in limbo right now.

agombert commented 3 years ago

My bad @himanshudabas I mixed two different things:

I Edit my comment above, your branch works perfectly !

batmanscode commented 3 years ago

I have fixed the profile_banner_url error in my branch. My PR for that patch hasn't been merged yet, meanwhile use the below command to install my fix from my brnach


pip3 install --user --upgrade git+https://github.com/himanshudabas/twint.git@origin/fix-parser#egg=twint

@CharleoY I ran into this problem today and your branch solved it! Thank you for putting in the time for a fix, much appreciated 😃

And as a side note, hopefully this gets merged sooner rather than later. I'm eager for Twint's next release as there are quite a few good PRs to be merged.