Closed AndyRPH closed 1 year ago
Hopefully the API calls to access the data would all be the same an the only difficulty is how to get an auth cookie. One annoying thing with this project as it currently stands is the inability to authenticate via some 3rd party broker like Google. That can't ever really be resolved (Google won't issue a token to just any caller) but if BrightHorizons offers a direct API to log-in then it would be possible to pull out the login code such that it can be configured for Tadpoles or BrightHorizons.
Everything here was reverse engineered by studying the API calls in Chrome's DevTools Network tab so it should be possible to evaluate this features feasibility without too much trouble.
Could you provide links to the other projects you've seen/used?
It is a direct login on mybrightdayapp.brighthorizons.com not a login with google or any other third party.
that is helpful - i'll begin a new branch and we'll see how far this can go
Thanks! Happy to test it whenever or share some credentials for a day if you need to poke around.
@AndyRPH please pull this PR's branch and try the commands:
$ make
$ bin/tadpoles-backup clear cookie
$ bin/tadpoles-backup --debug -p brightHorizons stat
stat
will print out a bunch of info about the time range of events, children names and number of images, videos, etc.
Please redact names and jwt token before posting logs here.
Looks like it's not authenticating. We login to https://mybrightdayapp.brighthorizons.com/login although the domain familyinfocenter.brighthorizons.com is also used for non-app functions like accounting and forms management, not tadpoles.com not sure if it's still trying tadpoles.com or just the console description of the login credential request shows a static label.
DEBU[0000] using Bright Horizons login
Input : tadpoles.com login required...
Email : aventura@augustahealth.com
Password :
DEBU[0020] JWT Token:eyJhbGciO...[REDACTED]...VanTARoCQw
DEBU[0020] Validate...
Login failed : Please try again...
Cmd Error : [Error] bright horizons token validation failed POST: /auth/jwt/validate => {"message":"Unauthorized"}
the input prompt was hard-coded - i've updated it to show provider specific text. its strange and unfortunate that the JWT token is not validating - could be a number of issues...
I can't say I can read the debug options in chrome well, but it looks like it's storing an API key. Strange that previous to 2023, a number of tools were working for both tadpoles and brigshthorizons apps considering the similar shared codebase/rebranding of tadpoles for brigshthorizons's facilities.
i can run the same flow with a tadpoles account (get jwt token, validate it, get auth cookie) without any problems calling the same api's. also presumably the token you get from https://familyinfocenter.brighthorizons.com/mybrightday/login
is not malformed (you can decode it with a site like https://jwt.io/ to see its contents). But perhaps the systems have changed so that the token is no longer acceptable to tadpoles for the validation step.
could you confirm that browsing photos, etc. via bright horizons calls api's like /events
, /obj_attachment
and /parameters
? you should see these calls in chromes developer tools network tab. that will at least confirm that they haven't switched to some other system on the back end.
the flow in this branch and the other projects is:
# should print a jwt token
curl -L -X POST "https://familyinfocenter.brighthorizons.com/mybrightday/login" -H "Content-Type: application/x-www-form-urlencoded" --data-urlencode "username=MY_EMAIL_HERE" --data-urlencode "password=MY_PASSWORD_HERE" --data-urlencode "response=jwt"
# use token from above cmd here
curl -L -X POST "https://www.tadpoles.com/auth/jwt/validate" -H "Content-Type: application/x-www-form-urlencoded" --data-urlencode "token=JWT_TOKEN_HERE"
Hmm, got the token from the first:
Header/algo: { "alg": "HS256", "typ": "JWT" }
payload: { "sub": "[redacted email]", "user_id": "[redacted digits]", "sa_id": "[redacted]]", "email": "[redacted email]", "nbf": [redacted digits], "exp": [redacted digits very close to nbf], "iat": [redacted same as nbf], "iss": "bright_horizons", "aud": "http://www.tadpoles.com" }
looks like navigating around calls things like
https://mybrightday.brighthorizons.com/api/v2/dependent/[redacted]/daily_reports
and it pulls images from such:
fetch("https://storage.googleapis.com/mbd-attachments-prod/[redacted, I think child ID?]/main.jpg?Expires=1691636051&GoogleAccessId=[redacted]-compute%40developer.gserviceaccount.com&Signature=[redacted]", { "cache": "default", "credentials": "omit", "headers": { "Accept": "image/webp,image/avif,video/;q=0.8,image/png,image/svg+xml,image/;q=0.8,/;q=0.5", "Accept-Language": "en-US,en;q=0.9", "Priority": "u=5, i", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.3 Safari/605.1.15" }, "method": "GET", "mode": "cors", "redirect": "follow", "referrer": "https://mybrightdayapp.brighthorizons.com/", "referrerPolicy": "strict-origin-when-cross-origin" })
Hmm, I'm wondering if their change this spring was actually to move away from tadpoles and now they'r pulling/hosting the content from google backend, I see early on in the login process the JWT is returned, but then converted to an API key which looks to be used to then go get a userID, sessionID, siteID, etc.
I'm thinking that means we've hit the end of the line for this being a low hanging fruit =\
maybe they are using a new api provided by tadpoles or a completely new in-house backend (probably this) either way the urls are clearly different. /api/v2/dependent/\<id>/daily_reports vs /remote/v1/daily_reports
the google storage stuff is not strictly unexpected, the tadpoles api redirects /remove/v1/obj_attachments
to google cloud presigned download urls (this is a common practice for securing image data)
but you are correct and this is no longer just resolving the authentication issue while reusing everything else in the code.
one last try - I noticed that the jwt validate api responds differently when called at the tadpoles vs bright horizons domain - some other peoples projects do different things here so i figured it's worth experimenting. You can pull the latest code on the branch and try the stat command again:
$ make
$ bin/tadpoles-backup clear cookie
$ bin/tadpoles-backup --debug -p brightHorizons stat
Somewhat progress? .
In Debug Mode
DEBU[0000] using Bright Horizons login
DEBU[0000] Admit...
Input : brighthorizons.com login required...
Email : aventura@augustahealth.com
Password :
DEBU[0008] JWT Token: [token redacted]
DEBU[0008] Validate...
DEBU[0009] Validate successful
DEBU[0009] Admit...
Login failed : Please try again...
Cmd Error : [Error] tadpoles admit failed POST: /remote/v1/athome/admit => {"message":"Not logged in"}
its encouraging that it validated the token - maybe the call to the admit endpoint is no longer required if the cookie set after validation is already enough when accessing the api via the brighthorizons domain. I've pushed a change to remove that step, maybe it helps?
In Debug Mode
DEBU[0000] using Bright Horizons login
Input : brighthorizons.com login required...
Email : aventura@augustahealth.com
Password :
DEBU[0007] JWT Token: [token redacted]
DEBU[0007] Validate...
DEBU[0007] Validate successful
DEBU[0007] Serialize cookies successful
Login expires : Sun Dec 31 07:03:58 PM
Cmd Error : [Error] could not get parameters GET: /remote/v1/parameters => Not Found
Would a dump from any of the chrome debug consoles help flesh out what it's trying to call instead?
ugh - its frustrating that some of the api's seem to be proxied and some do not /remote/v1/events /remote/v1/obj_attachment both work but parameters is missing...
the parameters endpoint is primarily called because it will list the first and last dates of valid events this is used to only fetch events that are between time ranges we care about.
the response from parameters has the shape:
{
"first_event_time": string,
"last_event_time": string,
"members": [
{
"dependants": [
# info about children, not really needed
]
}
]
}
i could probably get away with not calling the parameters api at all - but it will take some work
I've pushed new code that eliminates the need for the parameters
endpoint - @AndyRPH please try when you have a chance.
well, no errors at least.
In Debug Mode
DEBU[0000] using Bright Horizons login
Input : brighthorizons.com login required...
Email : [redacted]
Password :
DEBU[0005] JWT Token: [redacted]
DEBU[0005] Validate...
DEBU[0006] Validate successful
DEBU[0006] Serialize cookies successful
Login expires : Sun Dec 31 07:03:58 PM
DEBU[0006] Cursor: initialize
DEBU[0006] Query: https://mybrightday.brighthorizons.com/remote/v1/events?direction=range&earliest_event_time=0&latest_event_time=1692127141&num_events=300
Ah, after several mins I got this too :
DEBU[0606] Get Page Error: [Error] could not get events page GET: /remote/v1/events =>
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>
Cmd Error : [Error] could not get events page GET: /remote/v1/events =>
It's hard to tell if that error is because that endpoint no longer exists or it timed out or you got rate limited.
I've pushed new code that makes each event page fetch smaller (50 items vs 300) which could help. I've added an explicit timeout of 60 seconds per request so that it will error out specifically with a timeout if thats whats happening.
You should see debug output like this:
DEBU[0000] Page: 0 Cursor: initialize
DEBU[0000] Query: https://www.tadpoles.com/remote/v1/events?direction=range&earliest_event_time=0&latest_event_time=1692148289&num_events=50
DEBU[0000] Page: 1 Cursor: qpgj6nsakzdevDrQXaerJ3
DEBU[0000] Query: https://www.tadpoles.com/remote/v1/events?cursor=qpgj6nsakzdevDrQXaerJ3
DEBU[0002] Page: 2 Cursor: Z56oibDJGRFJoG8jAwiWFB
DEBU[0002] Query: https://www.tadpoles.com/remote/v1/events?cursor=Z56oibDJGRFJoG8jAwiWFB
DEBU[0002] Page: 3 Cursor: RuSWtQaSmmEqESX8uZDZSR
DEBU[0002] Query: https://www.tadpoles.com/remote/v1/events?cursor=RuSWtQaSmmEqESX8uZDZSR
there will probably be many, many pages (for example my data has 83)
Nope not much more lines
` In Debug Mode
DEBU[0000] using Bright Horizons login
Input : brighthorizons.com login required...
Email : [redacted]
Password :
DEBU[0013] JWT Token: e...[redacted]...Qk
DEBU[0013] Validate...
DEBU[0014] Validate successful
DEBU[0014] Serialize cookies successful
Login expires : Sun Dec 31 07:03:58 PM
DEBU[0014] Page: 0 Cursor: initialize
DEBU[0014] Query: https://mybrightday.brighthorizons.com/remote/v1/events?direction=range&earliest_event_time=0&latest_event_time=1692157201&num_events=50
DEBU[0074] Get Page Error: Get "https://mybrightday.brighthorizons.com/remote/v1/events?direction=range&earliest_event_time=0&latest_event_time=1692157201&num_events=50": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Cmd Error : Get "https://mybrightday.brighthorizons.com/remote/v1/events?direction=range&earliest_event_time=0&latest_event_time=1692157201&num_events=50": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
`
well it seems like although it gets past the authentication layer the api is not responding. unfortunately i don't think it's possible to resolve this without reverse engineering the website's api calls or sniffing a bright horizons apps api calls.
Well I appreciate your attempt. If you have the curiosity, I wouldn't mind sharing login credentials privately for a day so you can 'see' what's going on, but I know it wouldn't have any point for you unless making your code more flexible for this other purpose scratches a programming itch.
Thanks for your effort attempting it though!
@AndyRPH I was hoping it would not be necessary to share any credentials but I don't like to leave things unfinished. Send me an email at leo@leocov.com and we can arrange something.
Sent, thanks!
@AndyRPH This ended up being a bit of an interesting challenge because of how the bright horizons api segments its data. I had to put in quite a bit of work to parallelize the report lookups so it would not feel unbearably slow, but I think the improvements are well worth it!
You can pull latest on the branch and both stat
and backup
should (probably...) work.
$ make
$ bin/tadpoles-backup --debug -p brightHorizons stat
$ bin/tadpoles-backup --debug -p brightHorizons backup /a/folder/on/your/computer
On the first login the code will cache your api-key so you don't need to re-enter login creds every time (documentation on the clear
command is still valid although clearing cookies is clearing the api-key in this case).
The stat command will cache all the event data so the first time it will be slow but after that it will be fast.
The backup command will ignore the cache because for each media file we need to request a valid pre-signed download url (these are valid for 24 hours so in theory I could check if its expired but that adds a LOT of code complexity).
Actually there may be a bug in the task pool code, but I don't think it will be hard to resolve.
Cool I'll give it a whirl tomorrow afternoon. Thanks!
At first I got this:
` In Debug Mode
Input : brightHorizons login required... Email : [email] Password : Cmd Error : [Error] Failed to fetch bright horizons user profile GET: /api/v2/user/profile => {"detail":"No matching user types"}`
but thereafter consistently this after tying the 'clear all' flag first, then back to the stat command. same error below if I try the backup command too.
` In Debug Mode
Input : brightHorizons login required... Email : [email] Password : Cmd Error : [Error] Failed to fetch bright horizons user profile GET: /api/v2/user/profile => {"detail":"X-API-KEY header invalid"} `
🤦🏼 a silly mistake on my part - it seems i did all my testing after manually creating a cached api key file so i never saw that i missed calling the function that actually fetches the api key....
I've pushed updated code - you should run the clear login
command to delete the malformed key saved on your machine.
Different, but error still. Should I still be doing the stat to collect data before the backup command?
` In Debug Mode
Input : brightHorizons login required...
Email : [redacted]
Password :
DEBU[0008] Validate...
Cmd Error : [Error] bright horizons token validation failed POST: /api/v2/jwt/validate => {"detail":"Not Found"}`
almost there.... the url was typed in wrong 😭 branch updated
Thats odd. got an error again, so went to run clear login options again, and then stat again and boom, lots happening now on the console. but it didn't ask me to login again after I ran the clear login flag
are you running the clear commands with the -p brightHorizons
flag?
Wow. that was WAY WAY faster than the old javascript code I used to run back in 2021/2020. It just pulled down 5 years of one kid and 6.5 years of another in like 15 mins. It also pulled in the mp4 videos the teachers occasionally take.
This is such a gift, thanks so much!
glad to help, enjoy!
I'm going to close this issue and release the changes as version 2.0.0
Any idea how difficult it might be to adapt this workflow for BrightHorizons? They use the tadpole software behind the scenes, and prior to this spring, there were some scripts on GitHub that would work, but they no longer do. I'm hoping there might only be some minor tweaks to this to allow it to login to the my bright day portal and bulk download all the images?
Happy to chat and share a login briefly if it helped assess if this was a small change or not.