A web scraper that navigates to a Slack workspace and saves the posts and threads of a given channel or DM.
It uses Puppeteer headless browser for loading and interacting with Slack. It doesn't depend on installing an app in the Slack workspace or aquiring an API key. Instead, it logins to your Slack account and uses that to access the channel or DM.
It's helpful for saving information from a channel or DM without needing to ask a workspace administrator to export the data.
For example, if you're in the process of leaving your current company to join another, this tool is a great way to archive everything you've said and done on Slack.
npm install
to install the dependencies..example.env
file in the project root folder and rename it to .env
. Then modify following environment variables in .env
:SLACK_WORKSPACE_URL
, SLACK_EMAIL
and SLACK_PASSWORD
are required.
SLACK_WORKSPACE_URL
must be the URL you login to the workspace not app.slack.com
. Example: SLACK_WORKSPACE_URL=cloud-native.slack.com
. Note environment variables are set without quotes.SLACK_EMAIL
and SLACK_PASSWORD
are credentials used to login into the workspace.You must set one of CONVERSATION_NAMES
or CHANNEL_NAMES
or both.
["element1", "element2"]
. The array elements are double quoted and the last element doesn't have a trailing comma. You can escape a double quote in a string in JSON like this: ["string\"hello"]
CONVERSATION_NAMES
to scrape a DM or group chat. The value is the name tag of the person or group chat name as is written under "Direct Messages" in Slack. Example: CONVERSATION_NAMES=["Iuliu Pop (Core Grad)", "John Doe"]
.CHANNEL_NAMES
to scrape a public or private channels. It's the name you see under "channels" side tab in Slack. Example: CHANNEL_NAMES=["general", "random"]
.SCROLL_UP_TIMEOUT
is optional.
SCROLL_UP_TIMEOUT=30
HEADLESS_MODE
is optional.
true
to scrape with the browser in headless mode. Example: HEADLESS_MODE=true
.SKIP_THREADS
is optional.
true
to disable scraping threads on messages that are in channels or conversations. Example: SKIP_THREADS=true
.Before starting the scrape, make sure the Slack App language is set to English. You can reset it once the scrape is finished.
Run npm run collect
. You will see the browser open and start scraping data unless you set HEADLESS_MODE
to true
. In headless mode you will see status updates on the scraping process in the console output.
You need to configure WSL to connect to a GUI even if the browser launches in headless mode. Use this guide to configure WSL to connect to an X server installed on Windows. Before running the collect script, the X server must be open and WSL correctly configured to connect to it, or Puppeteer will fail to launch the browser.
npm run collect
, you can now run npm run parse
. You will be prompted to select the file to parse from the slack-data/
folder. Once the parsing is complete, a slack-data/x.json
file with same name as the source HTML file will be output with the parsed posts/threads.Thanks goes to these wonderful people (emoji key):
Iuliu Pop š¤ š» š š š¬ |
William Desportes š» š |
NotEdwin š š» |
This project follows the all-contributors specification. Contributions of any kind welcome!
Very open to contributions to this project! If you have questions, bug reports or features you want to see, please open an issue. If you want to contribute code, open a pull request and I'll review ASAP.