shaikhsajid1111 / twitter-scraper-selenium

Python's package to scrap Twitter's front-end easily
https://pypi.org/project/twitter-scraper-selenium
MIT License
299 stars 46 forks source link

Not scraping every tweet from a user #52

Open wjd157 opened 1 year ago

wjd157 commented 1 year ago

Hello, I am trying to scrape every tweet from a user. From the twitter page, I can see that they have tweeted more than 5000 times. However, even when I set my tweets_count to 5000, I am getting less than 1000 tweets from that user.

My code is below:

scrape_profile(twitter_username = "elonmusk", output_format ="csv", tweets_count = 6000, browser = "chrome", filename = "elonmusk")

(Note that @elonmusk is just a stand-in example)

shaikhsajid1111 commented 1 year ago

Hey @wjd157, that method uses browser automation for scraping and your tweet count is big so it might be getting blocked in between. I suggest you use the scrape_keyword_with_api() method for scraping. Try the below code, and check elon.json after scraping you will get the data you want

from twitter_scraper_selenium import scrape_keyword_with_api

scrape_keyword_with_api('from:elonmusk', output_filename='elon')
wjd157 commented 1 year ago

This appears to generate a JSON file with no data in it. Further, it the console tells me I have only scraped 24 tweets even though the account I am now trying has more than 200 tweets.

shaikhsajid1111 commented 1 year ago

Okay, I think this feature of Twitter only returns few tweets. Currently, I have not added feature to scrape Twitter account from Twitter's API, and the one with the browser automation get's blocked. I will add a new feature to scrape Twitter's profile from the API in a couple of weeks

christianmettri commented 1 year ago

I am also highly looking forward to this feature. Please let us know once you had time to implement this. Thanks a lot.

shaikhsajid1111 commented 1 year ago

Hi @christianmettri @wjd157 , Just updating you about it, don't know if you're still looking for the solution. Now, you can try

from twitter_scraper_selenium import scrape_profile_with_api

scrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)

and check musk.json file where the output will be saved

SenninOne commented 1 year ago

Hello @shaikhsajid1111 I tried this code and it gives me this error:

2023-02-28 02:33:09,836 - WARNING - Failed to make request!

The code:

from twitter_scraper_selenium import scrape_profile_with_api
import json

scrape_profile_with_api(username="NASA", output_filename="NASA", browser="firefox",tweets_count=50, output_dir="C:/Users/Braulio/Desktop/web scraping python")

with open('NASA.json') as f:
    NASA = json.load(f)

with open('NASAimages.html', 'w') as f:
    f.write('<html>\n')
    f.write('<head>\n')
    f.write('<title>Imágenes</title>\n')
    f.write('</head>\n')
    f.write('<body>\n')
    for tweet_id, tweet_data in caro.items():
        if tweet_data['username'] == 'NASA':
            for imagen in tweet_data['images']:
                f.write('<img src="{}" format=jpg&name=medium" alt="">\n'.format(imagen))
    f.write('</body>\n')
    f.write('</html>\n')

print("HTML READY")

I also tried with the function scrape_keyword_with_api, here is the code:


from twitter_scraper_selenium import scrape_keyword_with_api
import json

scrape_keyword_with_api(query="from:NASA", output_filename="NASA", tweets_count=50, output_dir="C:/Users/Braulio/Desktop/web scraping python")

with open('NASA.json') as f:
    NASA = json.load(f)

with open('imagenes.html', 'w') as f:
    f.write('<html>\n')
    f.write('<head>\n')
    f.write('<title>Imágenes</title>\n')
    f.write('</head>\n')
    f.write('<body>\n')
    for tweet_id, tweet_data in NASA.items():
        if tweet_data['username'] == 'NASA':
            for imagen in tweet_data['images']:
                f.write('<img src="{}" format=jpg&name=medium" alt="">\n'.format(imagen))
    f.write('</body>\n')
    f.write('</html>\n')

print("HTML READY")

It shows this error:

2023-02-28 02:37:18,021 - twitter_scraper_selenium.keyword_api - WARNING - Failed to make request!