iulspop / slack-web-scraper

Puppeteer configured to scrape the posts and threads of any channel on Slack.
MIT License
69 stars 23 forks source link

SSO auth script WIP #8

Closed iulspop closed 1 month ago

iulspop commented 2 years ago

Draft of NPM script for signing in with SSO manually. Currently not working.

It's tricky to sign in with Google SSO on an automated browser since by default Google prevents third-party apps from access to Google Accounts to prevent sophisticated man-in-the-middle attacks. See this Google Support thread and this Google Security Blog post for details.

If you try to sign in with Google SSO on Chromium controlled by Puppeteer, you get the message: "Couldn’t sign you in. This browser or app may not be secure. Try using a different browser". The workaround I found was to use the puppeteer-extra package to enable Puppeteer plugins, with the puppeteer-extra-plugin-stealth plugin. The plugin applies evasion techniques to make Puppeteer detection harder and successfully bypasses Google's protection.

However after I sign in with Google SSO, I'm redirected to this URL https://oauth2.slack.com/ with the message: "There’s been a glitch… We’re not quite sure what went wrong. You can go back, or try looking on our Help Center if you need a hand". I then attempted to sign in with Google SSO from a regular browser and I got the same error. Perhaps my sign-ins were marked suspicious through OAuth and I was blocked?

At this point, I cannot even test my changes since SSO currently doesn't work with my Google account. I'll have to wait and try again.

iulspop commented 2 years ago

For clarification, this PR adds an NPM script npm run auth. The idea is you can use it to login with SSO then click 'enter' in the terminal to save the cookies. That way you don't have save the cookies yourself manually with JS. After the cookies are saved, you run thenpm run collect script normally.