grundleborg / slack-advanced-exporter

A tool for exporting additional data from Slack that is missing from the official data export.
MIT License
95 stars 18 forks source link

Support for fetching users' e-mail addresses #5

Closed WGH- closed 6 years ago

WGH- commented 6 years ago

Since some time ago, Slack export zip no longer includes user e-mail addresses. However, they're still accessible through Slack API.

This commit adds new subcommand that fetches users' e-mail addresses and adds them into user.json file inside the export zip.

Fixes #4.

HOWEVER, for some unknown reasons importing such archive with e-mails causes every message to be duplicated. This might be a bug in Mattermost itself. I suggest to not merge this PR until it's investigated (that's what I'm doing right now).

grundleborg commented 6 years ago

@WGH- for the duplicate imports, does it import everything twice even on a blank team? I'm asking because Mattermost's Slack importer isn't idempotent, so if you import the same team twice you'll get two copies of every message.

WGH- commented 6 years ago

Yeah, this is really weird. I checked several times with clean Mattermost setup (clean Docker image), and the zip with e-mails always causes duplicate messages. The one without never does. I'm still trying to figure it out, hopefully, I will able to soon.

WGH- commented 6 years ago

This's getting weirder.

I'm comparing the following zip archives: unmodified, with attachments, with emails, with both.

I tried the archive with e-mails again, and it imported without any problems.

Then I tried the archive with both attachments and e-mails. The import process took a lot of time (more then half an hour), and all messages were duplicated 9 (!) times.

As of import log, if duplicated message problem occurs (doesn't matter whether messages are duplicated 2 or 9 times), there're the following lines once for every user and every channel:

Slack user merged with an existing Mattermost user with matching email <redacted> and username <redacted>
The Slack channel <redacted> already exists as an active Mattermost channel. Both channels have been merged.
WGH- commented 6 years ago

After examining log ever more carefully, I noticed that these two lines appear exactly 9 times:

[2018/02/21 14:30:10 UTC] [INFO] Purging all caches
[2018/02/21 14:30:10 UTC] [INFO] License key from https://mattermost.com required to unlock enterprise features.

And after that, warnings about missing user e-mails (and other import messages) repeat again. It's as though cache purge is the last step of Slack import, and for some reason import is attempted 9 times.

WGH- commented 6 years ago

I did a couple of more tries, and the problem seems to only appear when importing through the web interface. platform import slack is not affected. So this PR is fine, I guess.

WGH- commented 6 years ago

@grundleborg any comments? I used this PR to migrate our rather large workspace, and it worked perfectly fine.

grundleborg commented 6 years ago

Sorry, haven't had a moment to look at this properly. Will try and review it later today.