Login broken due to updated Leap Card site

AlanGreene commented 3 years ago

Looks like the login flow was updated again sometime in the last few days, so I'm now consistently getting the following error on login

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/leapcard/lib/python3.9/site-packages/pyleapcard/PyLeapCard.py", line 48, in try_login
    VIEWSTATE = soup.find(id="__VIEWSTATE")['value']
TypeError: 'NoneType' object is not subscriptable

I'm not a python developer so it might take me a while to get up to speed and work on a fix. I'll give it a go anyway as a learning experience, but happy to defer to anyone else who gets there first.

AlanGreene commented 3 years ago

So it looks like they're now using JavaScript both to generate the form and to provide a number of required values on the request (available in the global SETTINGS object on the login page). Extracting these by processing the response as text will be very brittle but the alternative would be to run something to actually execute the JS on the page.

I've hacked something together that works for my purposes using pyppeteer but I think it's a little on the heavy side for a generic API wrapper (requires Chromium etc.).

skhg commented 3 years ago

Hi Alan! Thanks for creating the issue. It seems the login flow has changed entirely, now they're using a Microsoft SSO provider at https://transportforireland.b2clogin.com/

Previously the whole thing was done while staying within the leapcard.ie domain.

Looking at the login request it might be possible to do the same as was done before, and send a manually constructed POST to https://transportforireland.b2clogin.com/transportforireland.onmicrosoft.com/B2C_1A_signup_signin/SelfAsserted

skhg commented 3 years ago

I did some digging on this and it appears that it should be possible to fake the whole login process using just HTTP requests, no Javascript. The flow goes something like this:

Click link at https://www.leapcard.ie/en/SelfServices/CardServices/CardOverView.aspx (or any other login-only URL) while logged out.
Browser redirects to Azure SSO provider at https://transportforireland.b2clogin.com/transportforireland.onmicrosoft.com/b2c_1a_signup_signin/oauth2/v2.0/authorize and provides a raft of URL params, headers and values in the body of the response. Some of these need to be retained for subsequent requests
Credentials must be POSTed to https://transportforireland.b2clogin.com/transportforireland.onmicrosoft.com/B2C_1A_signup_signin/SelfAsserted along with a bunch of other state information. However this POST returns an empty response
To confirm if the login worked, a separate GET is made to https://transportforireland.b2clogin.com/transportforireland.onmicrosoft.com/B2C_1A_signup_signin/api/CombinedSigninAndSignup/confirmed with yet more state
If everything worked and the login is confirmed, a POST is made to https://www.leapcard.ie/ and it eventually redirects to the requested page

Note that you can also use Google or Facebook to complete the login process and this ignores them for the time being.

I've tried searching for web scrapers that go through this b2clogin.com provider and can't find any at present

skhg / pyleapcard

Login broken due to updated Leap Card site #20