taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.39k stars 579 forks source link

Error INFO USER-AGENT #236

Open TrungKien1230 opened 4 years ago

TrungKien1230 commented 4 years ago

Hello @taspinar, When I code like this:

from twitterscraper import query_tweets

list_of_tweets = query_tweets('HRTechConf', begindate=datetime.date(2019, 9, 26), enddate=datetime.date(2019, 10, 6), lang='en')

tweets_df = pd.DataFrame([vars(x) for x in list_of_tweets])

And I have this result, could you help me, please

INFO: {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; x64; fr; rv:1.9.2.13) Gecko/20101203 Firebird/3.6.13'} Traceback (most recent call last): File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 456, in wrap_socket cnx.do_handshake() File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\OpenSSL\SSL.py", line 1915, in do_handshake self._raise_ssl_error(self._ssl, result) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\OpenSSL\SSL.py", line 1647, in _raise_ssl_error _raise_current_error() File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\OpenSSL_util.py", line 54, in exception_from_error_queue raise exception_type(errors) OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'tlsv1 alert access denied')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen chunked=chunked) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request self._validate_conn(conn) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 839, in _validate_conn conn.connect() File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connection.py", line 344, in connect sslcontext=context) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\util\ssl.py", line 345, in ssl_wrap_socket return context.wrap_socket(sock, server_hostname=server_hostname) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 462, in wrap_socket raise ssl.SSLError('bad handshake: %r' % e) ssl.SSLError: ("bad handshake: Error([('SSL routines', 'ssl3_read_bytes', 'tlsv1 alert access denied')])",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\adapters.py", line 449, in send timeout=timeout File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen _stacktrace=sys.exc_info()[2]) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\util\retry.py", line 399, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='free-proxy-list.net', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_read_bytes', 'tlsv1 alert access denied')])")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:/Users/nguy/PycharmProjects/Streaming_Tweets_Data/twitterScraper.py", line 5, in from twitterscraper import query_tweets File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\twitterscraper__init__.py", line 13, in from twitterscraper.query import query_tweets File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\twitterscraper\query.py", line 72, in proxies = get_proxies() File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\twitterscraper\query.py", line 42, in get_proxies response = requests.get(PROXY_URL) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\api.py", line 75, in get return request('get', url, params=params, kwargs) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\api.py", line 60, in request return session.request(method=method, url=url, kwargs) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\sessions.py", line 533, in request resp = self.send(prep, send_kwargs) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\sessions.py", line 646, in send r = adapter.send(request, kwargs) File "C:\Users\nguy\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\adapters.py", line 514, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='free-proxy-list.net', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_read_bytes', 'tlsv1 alert access denied')])")))

Makoto1021 commented 4 years ago

Hi @TrungKien1230 , have you solved this issue? I"m having the same problem. I first thought it was due to trying to scrape too many tweets at one go. So I waited for few days and tried again, but this error is still persistent. It'd be nice if you could share the solution for this! thanks

avanibhatnagar commented 4 years ago

@Makoto1021 @TrungKien1230 Have you been able to find a way to solve this? It was working last week no problem, but I get the same error now

JoeCarlPSU commented 4 years ago

same issues here

TrungKien1230 commented 4 years ago

As you can read on Policy of Twitter API. You have 2 ways to retrieve tweets: track in the pass (7 days) and streaming (right now) and you cannot retrieve over amount of tweets in a unit of time. With the streaming you can take as much as you want. This problem is because you took too much already.

You can read my code here, the code for retrieving tweets in the pass:

Firstly, you need to take ACCESS_TOKEN, ACCESS_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET from twitter and put it in twitter_credentials.py to mach with my code

import tweepy import twitter_credentials import json import csv import datetime

class rest_api: """ This class for collecting the tweets in the pass (7days max) """ def init(self):

Authenticate to Twitter

    self.auth = tweepy.OAuthHandler(twitter_credentials.CONSUMER_KEY, twitter_credentials.CONSUMER_SECRET)
    self.auth.set_access_token(twitter_credentials.ACCESS_TOKEN, twitter_credentials.ACCESS_TOKEN_SECRET)
    # Create API object
    self.api = tweepy.API(self.auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

def get_tweets(self, file_name, query, lang, date_time , max_tweets):
    # tweets = api.search(q=query, lang=lang, count=max_tweets, until=2019 - 11 - 6)
    # csvFile = open('res_api_csv', 'a')
    # csvWriter = csv.writer(csvFile)

    for i in range(len(query)):
        # cursor = tweepy.Cursor(self.api.search, q = query[i], lang = lang, until = date_time).items(max_tweets)
        for tweet in tweepy.Cursor(self.api.search, q = query[i], lang = lang, until = date_time).items(max_tweets):
            print(f'{tweet}')
            with open(file_name, 'a') as tf:
                tf.write(json.dumps(tweet._json) + '\n')
            # csvWriter.writerow([
            #     tweet.contributors, tweet.coordinates, tweet.created_at, tweet.entities,
            #                     tweet.favorite_count, tweet.favorited, tweet.geo,
            #                     tweet.id, tweet.id_str, tweet.in_reply_to_screen_name, tweet.in_reply_to_status_id,
            #                     tweet.in_reply_to_status_id_str, tweet.in_reply_to_user_id, tweet.in_reply_to_user_id_str,
            #                     tweet.is_quote_status, tweet.lang, tweet.metadata, tweet.place, tweet.possibly_sensitive,
            #                     tweet.retweet_count, tweet.retweeted, tweet.source,
            #                     tweet.text.encode('utf-8'), tweet.user])
    # csvFile.close()

q = ["renault twingo3", "renault twingo 3", "renault twingo III", "renault twingoIII", "renault clio5", "renault clio 5", "renault clio V" , "renault clioV", "renault arkana", "renault zoe2", "renault zoe 2", "re nault zoeII", "renault zoe II", "renault zoé2", "renault zoé 2", "renault zoé II", "renault zoéII", "renault captur2", "renault captur 2", "renault captur II", "renault capturII"]

rest_api = rest_api() rest_api.get_tweets(file_name = 'rest_api_csv.csv', query = q, lang = '', date_time = datetime.date.today(), max_tweets = 500)