jbms / finance-dl

Tools for automatically downloading/scraping personal financial data.
GNU General Public License v2.0
293 stars 37 forks source link

PayPal issue with CSRF-Token #80

Open tbrtje opened 2 years ago

tbrtje commented 2 years ago

The PayPal-importer seems to fail when trying to get the csrf-token after logging in. It seems like the whole structure of the webpage has changed. I wasnt able to find the csrf-token manually.

Relevant Traceback: Traceback (most recent call last): File "/opt/homebrew/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/homebrew/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/cli.py", line 94, in <module> main() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/cli.py", line 90, in main module.run(**spec) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 262, in run scrape_lib.run_with_scraper(Scraper, **kwargs) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 433, in run_with_scraper retry(fetch) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 411, in retry return func() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 431, in fetch scraper.run() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 258, in run self.save_transactions() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 200, in save_transactions transaction_list = self.get_transaction_list() File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 193, in get_transaction_list resp = self.make_json_request(url) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 166, in make_json_request 'x-csrf-token': self.get_csrf_token(), File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/paypal.py", line 177, in get_csrf_token body_element, = self.wait_and_locate((By.ID, "__react_data__")) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 263, in wait_and_locate return self.wait_and_return( File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/finance_dl/scrape_lib.py", line 247, in wait_and_return WebDriverWait(self.driver, timeout).until(predicate, message=message) File "/Users/thies/Finanzen/Beancount/venv/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 87, in until time.sleep(self._poll)

Zburatorul commented 2 years ago

I was able to modify the CSRF login to get the token but obtaining CSRF does not work in headless mode. Once it moves past the CSRF it fails in get_transactions_list() with:

[0/1] paypal [80s elapsed] 2022-11-23 22:06:41,850 paypal.py:188 [INFO] Getting transaction list
[0/1] paypal [81s elapsed] Traceback (most recent call last):
[0/1] paypal [81s elapsed]   File "/home/user/finance-dl/finance_dl/scrape_lib.py", line 411, in retry
[0/1] paypal [81s elapsed]     return func()
[0/1] paypal [81s elapsed]   File "/home/user/finance-dl/finance_dl/scrape_lib.py", line 431, in fetch
[0/1] paypal [81s elapsed]     scraper.run()
[0/1] paypal [81s elapsed]   File "/home/user/finance-dl/finance_dl/paypal.py", line 261, in run
[0/1] paypal [81s elapsed]     self.save_transactions()
[0/1] paypal [81s elapsed]   File "/home/user/finance-dl/finance_dl/paypal.py", line 202, in save_transactions
[0/1] paypal [81s elapsed]     transaction_list = self.get_transaction_list()
[0/1] paypal [81s elapsed]   File "/home/user/finance-dl/finance_dl/paypal.py", line 195, in get_transaction_list
[0/1] paypal [81s elapsed]     resp = self.make_json_request(url)
[0/1] paypal [81s elapsed]   File "/home/user/finance-dl/finance_dl/paypal.py", line 165, in make_json_request
[0/1] paypal [81s elapsed]     return self.driver.request(
[0/1] paypal [81s elapsed]   File "/home/user/.local/lib/python3.9/site-packages/seleniumrequests/request.py", line 165, in request
[0/1] paypal [81s elapsed]     self.requests_session.headers = get_webdriver_request_headers(self)
[0/1] paypal [81s elapsed]   File "/home/user/.local/lib/python3.9/site-packages/seleniumrequests/request.py", line 77, in get_webdriver_request_headers
[0/1] paypal [81s elapsed]     webdriver.switch_to.window(original_window_handle)
[0/1] paypal [81s elapsed]   File "/home/user/.local/lib/python3.9/site-packages/selenium/webdriver/remote/switch_to.py", line 112, in window
[0/1] paypal [81s elapsed]     self._w3c_window(window_name)
[0/1] paypal [81s elapsed]   File "/home/user/.local/lib/python3.9/site-packages/selenium/webdriver/remote/switch_to.py", line 134, in _w3c_window
[0/1] paypal [81s elapsed]     raise e
[0/1] paypal [81s elapsed]   File "/home/user/.local/lib/python3.9/site-packages/selenium/webdriver/remote/switch_to.py", line 123, in _w3c_window
[0/1] paypal [81s elapsed]     send_handle(window_name)
[0/1] paypal [81s elapsed]   File "/home/user/.local/lib/python3.9/site-packages/selenium/webdriver/remote/switch_to.py", line 119, in send_handle
[0/1] paypal [81s elapsed]     self._driver.execute(Command.SWITCH_TO_WINDOW, {'handle': h})
[0/1] paypal [81s elapsed]   File "/home/user/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
[0/1] paypal [81s elapsed]     self.error_handler.check_response(response)
[0/1] paypal [81s elapsed]   File "/home/user/.local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
[0/1] paypal [81s elapsed]     raise exception_class(message, screen, stacktrace)
[0/1] paypal [81s elapsed] selenium.common.exceptions.NoSuchWindowException: Message: no such window: No target with given id found
[0/1] paypal [81s elapsed]   (Session info: chrome=107.0.5304.110)
ioniua commented 2 years ago

hi @Zburatorul

What changes did you make to the login process to get CSRF? With headless=False, I get the following error. The browser window pops up and navigates to 'My Activities' screen before crashing.

 --connect=http://localhost:38517 --session-id=f80f6426b258d67401df4b05bd656efa
2022-11-30 12:12:58,087 paypal.py:144 [INFO] Finding username field
2022-11-30 12:12:58,268 paypal.py:147 [INFO] Entering username
2022-11-30 12:12:59,287 paypal.py:152 [INFO] Finding password field
2022-11-30 12:13:18,554 paypal.py:155 [INFO] Entering password
2022-11-30 12:13:23,395 paypal.py:159 [INFO] Logged in
2022-11-30 12:13:23,395 paypal.py:186 [INFO] Getting transaction list
2022-11-30 12:13:23,396 paypal.py:174 [INFO] Getting CSRF token
Traceback (most recent call last):
  File "/home/ioniua/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 411, in retry
    return func()
  File "/home/ioniua/.local/lib/python3.9/site-packages/finance_dl/scrape_lib.py", line 431, in fetch
    scraper.run()
  File "/home/ioniua/.local/lib/python3.9/site-packages/finance_dl/paypal.py", line 258, in run
    self.save_transactions()
  File "/home/ioniua/.local/lib/python3.9/site-packages/finance_dl/paypal.py", line 200, in save_transactions
    transaction_list = self.get_transaction_list()
  File "/home/ioniua/.local/lib/python3.9/site-packages/finance_dl/paypal.py", line 193, in get_transaction_list
    resp = self.make_json_request(url)
  File "/home/ioniua/.local/lib/python3.9/site-packages/finance_dl/paypal.py", line 164, in make_json_request
    return self.driver.request(
  File "/home/ioniua/.local/lib/python3.9/site-packages/seleniumrequests/request.py", line 159, in request
    self.requests_session.headers = get_webdriver_request_headers(self, proxy_host=self.__proxy_host)
  File "/home/ioniua/.local/lib/python3.9/site-packages/seleniumrequests/request.py", line 76, in get_webdriver_request_headers
    webdriver.switch_to.window(original_window_handle)
  File "/home/ioniua/.local/lib/python3.9/site-packages/selenium/webdriver/remote/switch_to.py", line 134, in window
    self._w3c_window(window_name)
  File "/home/ioniua/.local/lib/python3.9/site-packages/selenium/webdriver/remote/switch_to.py", line 143, in _w3c_window
    send_handle(window_name)
  File "/home/ioniua/.local/lib/python3.9/site-packages/selenium/webdriver/remote/switch_to.py", line 139, in send_handle
    self._driver.execute(Command.SWITCH_TO_WINDOW, {'handle': h})
  File "/home/ioniua/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 430, in execute
    self.error_handler.check_response(response)
  File "/home/ioniua/.local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: disconnected: received Inspector.detached event
  (Session info: chrome=107.0.5304.121)
Zburatorul commented 1 year ago

@ioniua, from your trace it follows that the CSRF is successfully obtained and is therefore not the problem. It's the driver.request method that's failing. I have the same error and don't know why.