iulspop / slack-web-scraper

Puppeteer configured to scrape the posts and threads of any channel on Slack.
MIT License
66 stars 19 forks source link

Login with SSO #5

Open pathetiq opened 2 years ago

pathetiq commented 2 years ago

Hi all thanks for this scrapper!

I need to use it in an environment that use SSO (okta and others) to login. Would there be a way to set some authentication cookies to use your app instead of the user password?

Thanks for your time appreciate it!

iulspop commented 2 years ago

Hi @pathetiq!

In short, you can create a cookies folder in the project root folder, create a slack-session-cookies.json file and paste your cookies there as JSON. The collect script will try to read and set the cookies from that file. If that fails, it tries to log in to Slack with the password by controlling the browser. If you add the cookies first, you can log in with SSO instead of using the user password.

For getting the cookies in the first place, you could probably run some JS code in the dev tools when logged into Slack to output the cookies set on the page as JSON, and then you paste it into the cookies file.

Does that answer help? :)

JamesDConley commented 1 year ago

Hi! I have the cookies pulled from firefox, but am having trouble adding them. It looks like it's trying to login through the main page still, and is giving the below error.

> collect
> node src/collectData/collectData.js

Scraping took 0 minutes and 7 seconds
/home/james/work/slack_comments/slack-web-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26
        throw new Error(message);
              ^

Error: No node found for selector: #email
    at assert (/home/james/work/slack_comments/slack-web-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
    at DOMWorld.type (/home/james/work/slack_comments/slack-web-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:317:32)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async loginAndSaveCookies (/home/james/work/slack_comments/slack-web-scraper/src/collectData/utils/loginToSlack.js:22:3)
    at async loginToSlack (/home/james/work/slack_comments/slack-web-scraper/src/collectData/utils/loginToSlack.js:15:5)
    at async /home/james/work/slack_comments/slack-web-scraper/src/collectData/collectData.js:15:3
iulspop commented 1 year ago

Hi @JamesDConley. I'll take look at this within the next three days.

iulspop commented 1 year ago

Hi @JamesDConley, I have a lot on my plate right now, but I haven't forgotten about this and will get to this. Sorry if this is blocking you.

janga1 commented 1 year ago

Hey @iulspop, I just wanted to let you know I've bumped into the same issue. I sign in to slack via Google's SSO, but I cannot seem to get this working with your scraper. Perhaps I'm doing something wrong, I don't know. I've created a cookies directory with a slack-session-cookies.json file, filled with a json object with all my slack cookies, but it doesn't seem to help.

Any suggestions? Thanks

PS: If possible, maybe a solution would be to have the user authenticate themselves using the given puppeteer browser window, to get to the desired workspace. After that your software could take over. This could potentially lower the complexity of your code for you.

iulspop commented 1 year ago

Hey @janga1 , I'm looking into this now. Thanks for the suggestion. It might be in fact the best option to let people authenticate themselves.

iulspop commented 1 year ago

Hi! I opened a draft PR #8 for a WIP script for signing in with SSO OAuth V2, however, it's not working when I test it with my accounts. See the PR for more details.

janga1 commented 1 year ago

Ah, that sucks. I just tested the branch and actually ran into an error which I don't get on main:

> collect
> node src/collectData/collectData.js

Scraping took 0 minutes and 0 seconds
node:internal/errors:477
    ErrorCaptureStackTrace(err);
    ^

Error: spawn Unknown system error -86
    at ChildProcess.spawn (node:internal/child_process:413:11)
    at Object.spawn (node:child_process:713:9)
    at BrowserRunner.start (/Users/janwieringa/Documents/slack-web-scraper/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:91:34)
    at ChromeLauncher.launch (/Users/janwieringa/Documents/slack-web-scraper/node_modules/puppeteer/lib/cjs/puppeteer/node/Launcher.js:109:16)
    at async launchBrowser (/Users/janwieringa/Documents/slack-web-scraper/src/collectData/utils/launchBrowser.js:22:19)
    at async /Users/janwieringa/Documents/slack-web-scraper/src/collectData/collectData.js:13:29 {
  errno: -86,
  code: 'Unknown system error -86',
  syscall: 'spawn'
}

I took a peek at the code, but I can't see what's causing this. I've tried both headless and visible browser options...

EDIT: Nevermind, it's the Chromium browser that's installed with puppeteer that's the issue. It's targeting Intel Mac's, while I'm on M1. I tested main on a Linux machine. Let me see if I can test this tomorrow...

kristiankyvik commented 1 year ago

Hello! I would love to use this but IU also require being able to bypass oauth (I do not use the combination of username + password to login into my workspace).

iulspop commented 1 year ago

@kristiankyvik about a month and a half ago I gave a shot at this failed to implement it: https://github.com/iulspop/slack-web-scraper/pull/8

The issue is OAuth authentication fails for some reason when I do it in an automated browser and I'm haven't figured out how to get around it.

Happy to review and accept a contribution if someone can solve that! I don't know when I'll prioritize adding features to this project again. Though this is priority 1 if I get a chance.

Jeffxz commented 1 year ago

Hi! I have the cookies pulled from firefox, but am having trouble adding them. It looks like it's trying to login through the main page still, and is giving the below error.

> collect
> node src/collectData/collectData.js

Scraping took 0 minutes and 7 seconds
/home/james/work/slack_comments/slack-web-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26
        throw new Error(message);
              ^

Error: No node found for selector: #email
    at assert (/home/james/work/slack_comments/slack-web-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
    at DOMWorld.type (/home/james/work/slack_comments/slack-web-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:317:32)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async loginAndSaveCookies (/home/james/work/slack_comments/slack-web-scraper/src/collectData/utils/loginToSlack.js:22:3)
    at async loginToSlack (/home/james/work/slack_comments/slack-web-scraper/src/collectData/utils/loginToSlack.js:15:5)
    at async /home/james/work/slack_comments/slack-web-scraper/src/collectData/collectData.js:15:3

I used "Export cookie JSON file for Puppeteer" extension in chrome, saved as file "slack-session-cookies.json" under "cookies" folder and it worked well for me. I guess the cookie might needed to stored as some format that puppeteer could consume. From the error message "No node found for selector: #email" my guess is the part loginToSlack function inside logToSlack.json json parser got some error so it went into finding email node and use email to sign in again.