Open AlanGreene opened 3 years ago
So it looks like they're now using JavaScript both to generate the form and to provide a number of required values on the request (available in the global SETTINGS
object on the login page). Extracting these by processing the response as text will be very brittle but the alternative would be to run something to actually execute the JS on the page.
I've hacked something together that works for my purposes using pyppeteer
but I think it's a little on the heavy side for a generic API wrapper (requires Chromium etc.).
Hi Alan! Thanks for creating the issue. It seems the login flow has changed entirely, now they're using a Microsoft SSO provider at https://transportforireland.b2clogin.com/
Previously the whole thing was done while staying within the leapcard.ie domain.
Looking at the login request it might be possible to do the same as was done before, and send a manually constructed POST
to https://transportforireland.b2clogin.com/transportforireland.onmicrosoft.com/B2C_1A_signup_signin/SelfAsserted
I did some digging on this and it appears that it should be possible to fake the whole login process using just HTTP requests, no Javascript. The flow goes something like this:
https://www.leapcard.ie/en/SelfServices/CardServices/CardOverView.aspx
(or any other login-only URL) while logged out.https://transportforireland.b2clogin.com/transportforireland.onmicrosoft.com/b2c_1a_signup_signin/oauth2/v2.0/authorize
and provides a raft of URL params, headers and values in the body of the response. Some of these need to be retained for subsequent requestshttps://transportforireland.b2clogin.com/transportforireland.onmicrosoft.com/B2C_1A_signup_signin/SelfAsserted
along with a bunch of other state information. However this POST returns an empty responsehttps://transportforireland.b2clogin.com/transportforireland.onmicrosoft.com/B2C_1A_signup_signin/api/CombinedSigninAndSignup/confirmed
with yet more statehttps://www.leapcard.ie/
and it eventually redirects to the requested pageNote that you can also use Google or Facebook to complete the login process and this ignores them for the time being.
I've tried searching for web scrapers that go through this b2clogin.com
provider and can't find any at present
Looks like the login flow was updated again sometime in the last few days, so I'm now consistently getting the following error on login
I'm not a python developer so it might take me a while to get up to speed and work on a fix. I'll give it a go anyway as a learning experience, but happy to defer to anyone else who gets there first.