iSarabjitDhiman / TweeterPy

TweeterPy is a python library to extract data from Twitter. TweeterPy API lets you scrape data from a user's profile like username, userid, bio, followers/followings list, profile media, tweets, etc.
MIT License
123 stars 17 forks source link

Login does not work any more #17

Closed nballen-tx closed 11 months ago

nballen-tx commented 11 months ago

Hi,

Looks like the login gateway has changed or blocked somehow.

twitter = TweeterPy()
twitter.login(account,password)

API Updated Successfully.
'content-type'3

File /opt/conda/lib/python3.9/site-packages/tweeterpy/login_util.py:33, in TaskHandler._get_flow_token(self)
     22 params = {'flow_name': 'login'}
     23 payload = {'input_flow_data': {
     24     'flow_context': {'debug_overrides': {}, 'start_location': {'location': 'manual_link'}, }, },
     25     'subtask_versions': {'action_list': 2, 'alert_dialog': 1, 'app_download_cta': 1, 'check_logged_in_account': 1,
   (...)
     31                          'settings_list': 7, 'show_code': 1, 'sign_up': 2, 'sign_up_review': 4, 'tweet_selection_urt': 1, 'update_users': 1,
     32                          'upload_media': 1, 'user_recommendations_list': 4, 'user_recommendations_urt': 1, 'wait_spinner': 3, 'web_modal': 1}}
---> 33 return make_request(Path.TASK_URL, method="POST", params=params, json=payload)

File /opt/conda/lib/python3.9/site-packages/tweeterpy/request_util.py:38, in make_request(url, session, method, max_retries, timeout, **kwargs)
     35 if api_limit_stats.get('rate_limit_exhausted'):
     36     print(
     37         f"\033[91m Rate Limit Exceeded:\033[0m {api_limit_stats}")
---> 38 raise error

File /opt/conda/lib/python3.9/site-packages/tweeterpy/request_util.py:22, in make_request(url, session, method, max_retries, timeout, **kwargs)
     20 api_limit_stats = util.check_api_rate_limits(response)
     21 soup = bs4.BeautifulSoup(response.content, "lxml")
---> 22 if "json" in response.headers["Content-Type"]:
     23     return util.check_for_errors(response.json())
     24 response_text = "\n".join(
     25     [line.strip() for line in soup.text.split("\n") if line.strip()])

File /opt/conda/lib/python3.9/site-packages/requests/structures.py:52, in CaseInsensitiveDict.__getitem__(self, key)
     51 def __getitem__(self, key):
---> 52     return self._store[key.lower()][1]

KeyError: 'content-type'

Thanks,

iSarabjitDhiman commented 11 months ago

It seems to be working fine on my side. Do you mind giving it another try? Let me know if the issue persists.

nballen-tx commented 11 months ago

Just tried it, still the same error.

nballen-tx commented 11 months ago

`File /opt/conda/lib/python3.9/site-packages/tweeterpy/tweeterpy.py:189, in TweeterPy.login(self, username, password)
    187 if password is None:
    188     password = getpass.getpass()
--> 189 TaskHandler().login(username, password)
    190 util.generate_headers(session=self.session)

File /opt/conda/lib/python3.9/site-packages/tweeterpy/login_util.py:93, in TaskHandler.login(self, username, password)
     91 otp = None
     92 task_flow_mapper = self._create_task_mapper(username,password,otp)
---> 93 response = self._get_flow_token()
     94 self._get_javscript_instrumentation_subtask()
     95 while tasks_pending:

File /opt/conda/lib/python3.9/site-packages/tweeterpy/login_util.py:33, in TaskHandler._get_flow_token(self)
     22 params = {'flow_name': 'login'}
     23 payload = {'input_flow_data': {
     24     'flow_context': {'debug_overrides': {}, 'start_location': {'location': 'manual_link'}, }, },
     25     'subtask_versions': {'action_list': 2, 'alert_dialog': 1, 'app_download_cta': 1, 'check_logged_in_account': 1,
   (...)
     31                          'settings_list': 7, 'show_code': 1, 'sign_up': 2, 'sign_up_review': 4, 'tweet_selection_urt': 1, 'update_users': 1,
     32                          'upload_media': 1, 'user_recommendations_list': 4, 'user_recommendations_urt': 1, 'wait_spinner': 3, 'web_modal': 1}}
---> 33 return make_request(Path.TASK_URL, method="POST", params=params, json=payload)

File /opt/conda/lib/python3.9/site-packages/tweeterpy/request_util.py:38, in make_request(url, session, method, max_retries, timeout, **kwargs)
     35 if api_limit_stats.get('rate_limit_exhausted'):
     36     print(
     37         f"\033[91m Rate Limit Exceeded:\033[0m {api_limit_stats}")
---> 38 raise error

File /opt/conda/lib/python3.9/site-packages/tweeterpy/request_util.py:22, in make_request(url, session, method, max_retries, timeout, **kwargs)
     20 api_limit_stats = util.check_api_rate_limits(response)
     21 soup = bs4.BeautifulSoup(response.content, "lxml")
---> 22 if "json" in response.headers["Content-Type"]:
     23     return util.check_for_errors(response.json())
     24 response_text = "\n".join(
     25     [line.strip() for line in soup.text.split("\n") if line.strip()])

File /opt/conda/lib/python3.9/site-packages/requests/structures.py:52, in CaseInsensitiveDict.__getitem__(self, key)
     51 def __getitem__(self, key):
---> 52     return self._store[key.lower()][1]

KeyError: 'content-type'`
iSarabjitDhiman commented 11 months ago

I still can't replicate the issue. Here is what you can do.

Go to request_util.py file.

(Scroll down to the very bottom) Go to line 149 in where it raises the error in validate_response function.

"""
just print the response before throwing an error and see if it helps. 

this is the block of the code. 

    except Exception as error: 
         # print(f"{error}\n\n{response_text}\n") 
         if api_limit_stats and api_limit_stats.get('rate_limit_exhausted'): 
             print(f"\033[91m Rate Limit Exceeded:\033[0m {api_limit_stats}") 
         raise error

"""
print(response)
raise error

# you can also uncomment the first line in except block, it does the same thing.
nballen-tx commented 11 months ago

add the print(response.headers) before if "json" in response.headers["Content-Type"]

for retry_count, _ in enumerate(range(max_retries), start=1):
        try:
            response_text = ""
            response = session.request(method, url, timeout=timeout, **kwargs)
            api_limit_stats = util.check_api_rate_limits(response)
            soup = bs4.BeautifulSoup(response.content, "lxml")
            print(response.headers)
            if "json" in response.headers["Content-Type"]:
                return util.check_for_errors(response.json())
            response_text = "\n".join(
                [line.strip() for line in soup.text.split("\n") if line.strip()])
            response.raise_for_status()
            return soup
        except KeyboardInterrupt:
            print("Keyboard Interruption...")
            return
        except Exception as error:
            print(f"Retry No. ==> {retry_count}", end="\r")
            if retry_count >= max_retries:
                print(f"{error}\n\n{response_text}\n")
                if api_limit_stats.get('rate_limit_exhausted'):
                    print(
                        f"\033[91m Rate Limit Exceeded:\033[0m {api_limit_stats}")
                raise error

And it returns

{'date': 'Tue, 08 Aug 2023 15:10:40 UTC', 'perf': '7626143928', 'server': 'tsa_k', 'cache-control': 'no-cache, no-store, max-age=0', 'backoff-policy': 'backoff=1234;serial-duration=30000;serial-delay=500;no-retry=true', 'content-length': '0', 'x-transaction-id': '21a9f5a511d56b9a', 'x-rate-limit-limit': '187', 'x-rate-limit-reset': '1691508340', 'x-rate-limit-remaining': '186', 'strict-transport-security': 'max-age=631138519', 'x-response-time': '182', 'x-connection-hash': '682316ebfe4492a9ae4389739eb441a0e1d118f69000c31acd4ac1b47cca4617'}
{'date': 'Tue, 08 Aug 2023 15:10:40 UTC', 'perf': '7626143928', 'server': 'tsa_k', 'cache-control': 'no-cache, no-store, max-age=0', 'backoff-policy': 'backoff=1234;serial-duration=30000;serial-delay=500;no-retry=true', 'content-length': '0', 'x-transaction-id': '366793825e2e498f', 'x-rate-limit-limit': '187', 'x-rate-limit-reset': '1691508340', 'x-rate-limit-remaining': '185', 'strict-transport-security': 'max-age=631138519', 'x-response-time': '180', 'x-connection-hash': '682316ebfe4492a9ae4389739eb441a0e1d118f69000c31acd4ac1b47cca4617'}
{'date': 'Tue, 08 Aug 2023 15:10:40 UTC', 'perf': '7626143928', 'server': 'tsa_k', 'cache-control': 'no-cache, no-store, max-age=0', 'backoff-policy': 'backoff=1234;serial-duration=30000;serial-delay=500;no-retry=true', 'content-length': '0', 'x-transaction-id': 'c3943ed796437b89', 'x-rate-limit-limit': '187', 'x-rate-limit-reset': '1691508340', 'x-rate-limit-remaining': '184', 'strict-transport-security': 'max-age=631138519', 'x-response-time': '172', 'x-connection-hash': '682316ebfe4492a9ae4389739eb441a0e1d118f69000c31acd4ac1b47cca4617'}
'content-type'3
     21 soup = bs4.BeautifulSoup(response.content, "lxml")
     22 print(response.headers)
---> 23 if "json" in response.headers["Content-Type"]:
     24     return util.check_for_errors(response.json())
     25 response_text = "\n".join(
     26     [line.strip() for line in soup.text.split("\n") if line.strip()])

File ~\anaconda3\lib\site-packages\requests\structures.py:54, in CaseInsensitiveDict.__getitem__(self, key)
     53 def __getitem__(self, key):
---> 54     return self._store[key.lower()][1]

KeyError: 'content-type'

Looks like ["Content-Type"] is not in response.headers any more.

Is this due to Twitter backend change?

What data do you get from print(response.headers) ?

iSarabjitDhiman commented 11 months ago

You are right, content-type is not in response header which is strange. Btw what do you get in response? NoneType, some html or json object? print(response.headers) returns response headers (headers sent by the server(twitter).) Instead of printing response.headers, try printing the response itself, check what type of response do you get. You can also try:

-logging in from some other account. -logging in with the code from master branch

nballen-tx commented 11 months ago

Tried to print print(response.content)

b''
b''ry No. ==> 1
b''ry No. ==> 2
'content-type'3

Look it is empty in reponse.

Above are from the master branch, and I tried other accounts and the results are the same.

nballen-tx commented 11 months ago

async-await branch is actually fine though.

what has been changed regarding the login machnism?

iSarabjitDhiman commented 11 months ago

I believe the login mechanism is still the same, I will take another look for you. I mean its working for me and others as well.

Try this out. https://github.com/iSarabjitDhiman/TweeterPy/issues/2#issuecomment-1621880787

pip install Brotli

Recently a similar issue was opened, we were getting an empty response. We managed to fix it by installing the Brotli library. Let me know if it still doesn't get fixed, I will try to replicate the issue on my end in the meantime.

nballen-tx commented 11 months ago

Thanks - I realized it might not be because the async-await branch but when I install it, it reinstalled all the dependency packages.

So now it is a impossible to know what actually caused the issue, as I am not able to replicate the issue any more.


Update:

I found it is due to urllib3, so if I update my version to 2.0.3 then it is alright.

pip install urllib3==2.0.3