pR0Ps / slack-to-discord

Import a Slack export into a Discord server
https://pypi.org/project/slack-to-discord/
78 stars 11 forks source link

Import failed with missing key #28

Closed oubiwann closed 10 months ago

oubiwann commented 1 year ago

Error message:

Traceback (most recent call last):
  File "/usr/local/bin/slack-to-discord", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/__main__.py", line 27, in main
    run_import(
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 566, in run_import
    raise client._exception
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 409, in on_ready
    await self._run_import(g)
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 477, in _run_import
    for msg in slack_channel_messages(self._data_dir, chan_name, self._users, emoji_map, pins):
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 172, in slack_channel_messages
    for d in sorted(data, key=lambda x: x["ts"]):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 172, in <lambda>
    for d in sorted(data, key=lambda x: x["ts"]):
                                        ~^^^^^^
KeyError: 'ts'
oubiwann commented 1 year ago

I made a quick hack to get around this, and then got another error:

Traceback (most recent call last):
  File "/usr/local/bin/slack-to-discord", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/__main__.py", line 27, in main
    run_import(
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 569, in run_import
    raise client._exception
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 412, in on_ready
    await self._run_import(g)
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 480, in _run_import
    for msg in slack_channel_messages(self._data_dir, chan_name, self._users, emoji_map, pins):
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 176, in slack_channel_messages
    text = d["text"]
           ~^^^^^^^^
KeyError: 'text'

After a second work-around, the import is back up and running. Here's the diff for the hacks:

diff --git a/slack_to_discord/importer.py b/slack_to_discord/importer.py
index b83dc74..80acea1 100644
--- a/slack_to_discord/importer.py
+++ b/slack_to_discord/importer.py
@@ -142,6 +142,10 @@ def slack_filedata(f):
     }

+def ts_fun(x):
+    if "ts" in x:
+        return x["ts"]
+
 def slack_channel_messages(d, channel_name, users, emoji_map, pins):
     def mention_repl(m):
         type_ = m.group(1)
@@ -168,7 +172,9 @@ def slack_channel_messages(d, channel_name, users, emoji_map, pins):
     for file in sorted(glob.glob(os.path.join(channel_dir, "*.json"))):
         with open(file, "rb") as fp:
             data = json.load(fp)
-        for d in sorted(data, key=lambda x: x["ts"]):
+        for d in sorted(data, key=ts_fun):
+            if not "text" in d:
+                continue
             text = d["text"]
             text = MENTION_RE.sub(mention_repl, text)
             text = LINK_RE.sub(lambda x: x.group(1), text)
oubiwann commented 1 year ago

So far so good ... the importer is powering though YEARS of Slack messages (still). Haven't had to restart since the above hack was put in place.

oubiwann commented 1 year ago

Import completed successfully with ~30k messages.

pR0Ps commented 1 year ago

Glad everything worked for you!

Would it be possible for you to post or privately email me some (optionally redacted) JSON messages that don't have the ts/text fields in them? Since the ts is a timestamp I'm really curious about what sort of message wouldn't have it. For text, I've definitely seen events not have any, but usually it's just been an empty string/null, not just be missing the field entirely.

Essentially I'd like to be able to understand why these cases exist in order to figure out the best way of dealing with them.

oubiwann commented 1 year ago

Yeah, that's the right way to do it.

I'll invert the logic, add a short-circuit, and then re-run: it should spit out the culprits pretty quickly.

I'll paste/attach sanitised data here ...

oubiwann commented 1 year ago

Nice job, btw -- this project's code is far cleaner (and thus not off-putting to tweak) than other projects for doing similar things.

pR0Ps commented 1 year ago

Were you ever able to re-run with more logging enabled? In case it was a matter of not wanting to deal with duplicate messages being imported or figuring out how to stub the code out, here's a simple patch that will just log out the offending messages without importing anything (applies to v1.1.5 - the latest release).

It can be automatically applied by saving it to a file, then running patch <path to slack_to_discord/importer.py> <path to patch> in the terminal.

diff --git a/slack_to_discord/importer.py b/slack_to_discord/importer.py
index b83dc74..c6a73c9 100644
--- a/slack_to_discord/importer.py
+++ b/slack_to_discord/importer.py
@@ -168,6 +168,9 @@ def mention_repl(m):
     for file in sorted(glob.glob(os.path.join(channel_dir, "*.json"))):
         with open(file, "rb") as fp:
             data = json.load(fp)
+        for x in data:
+            if "ts" not in x or "text" not in x:
+                print(x)  # print out problematic message
         for d in sorted(data, key=lambda x: x["ts"]):
             text = d["text"]
             text = MENTION_RE.sub(mention_repl, text)
@@ -260,6 +263,7 @@ def mention_repl(m):
             else:
                 messages[ts] = msg

+    return  # prevent actually importing anything
     # Sort the dicts by timestamp and yield the messages
     for msg in (messages[x] for x in sorted(messages.keys())):
         msg["replies"] = [msg["replies"][x] for x in sorted(msg["replies"].keys())]
syedzainqadri commented 1 year ago
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/slack_to_discord/importer.py", line 408, in on_ready
    await self._run_import(g)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/slack_to_discord/importer.py", line 476, in _run_import
    for msg in slack_channel_messages(self._data_dir, chan_name, self._users, emoji_map, pins):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/slack_to_discord/importer.py", line 171, in slack_channel_messages
    for d in sorted(data, key=lambda x: x["ts"]):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/slack_to_discord/importer.py", line 171, in <lambda>
    for d in sorted(data, key=lambda x: x["ts"]):
                                        ~^^^^^^
KeyError: 'ts'

any solution to this @oubiwann @pR0Ps @maur1th

pR0Ps commented 1 year ago

any solution to this

@syedzainqadri if you could apply the patch I posted above and report what it prints it would allow me to figure out why the issue is happening and how to fix it.

Alternatively, if you send me the export that's having issues (feel free to email it if it's sensitive, check my profile) I can figure it out from there.

damonseeley commented 10 months ago

Hi, I'm encountering the same error. Would be happy to email you the files in question.

here's the output

2024-01-17 23:05:26 INFO     slack_to_discord.importer Processing channel '#techtalk'...
2024-01-17 23:05:26 CRITICAL slack_to_discord.importer Failed to finish import!
Traceback (most recent call last):
  File "/Users/damon/.pyenv/versions/3.9.1/lib/python3.9/site-packages/slack_to_discord/importer.py", line 408, in on_ready
    await self._run_import(g)
  File "/Users/damon/.pyenv/versions/3.9.1/lib/python3.9/site-packages/slack_to_discord/importer.py", line 476, in _run_import
    for msg in slack_channel_messages(self._data_dir, chan_name, self._users, emoji_map, pins):
  File "/Users/damon/.pyenv/versions/3.9.1/lib/python3.9/site-packages/slack_to_discord/importer.py", line 171, in slack_channel_messages
    for d in sorted(data, key=lambda x: x["ts"]):
  File "/Users/damon/.pyenv/versions/3.9.1/lib/python3.9/site-packages/slack_to_discord/importer.py", line 171, in <lambda>
    for d in sorted(data, key=lambda x: x["ts"]):
KeyError: 'ts'
2024-01-17 23:05:26 INFO     slack_to_discord.importer Bot logging out
damonseeley commented 10 months ago

I'm using v 1.1.5 via pip. I applied the patch from 8/4/23, but still throws the error

damonseeley commented 10 months ago

I made a quick hack to get around this, and then got another error:

Traceback (most recent call last):
  File "/usr/local/bin/slack-to-discord", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/__main__.py", line 27, in main
    run_import(
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 569, in run_import
    raise client._exception
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 412, in on_ready
    await self._run_import(g)
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 480, in _run_import
    for msg in slack_channel_messages(self._data_dir, chan_name, self._users, emoji_map, pins):
  File "/usr/local/lib/python3.11/site-packages/slack_to_discord/importer.py", line 176, in slack_channel_messages
    text = d["text"]
           ~^^^^^^^^
KeyError: 'text'

After a second work-around, the import is back up and running. Here's the diff for the hacks:

diff --git a/slack_to_discord/importer.py b/slack_to_discord/importer.py
index b83dc74..80acea1 100644
--- a/slack_to_discord/importer.py
+++ b/slack_to_discord/importer.py
@@ -142,6 +142,10 @@ def slack_filedata(f):
     }

+def ts_fun(x):
+    if "ts" in x:
+        return x["ts"]
+
 def slack_channel_messages(d, channel_name, users, emoji_map, pins):
     def mention_repl(m):
         type_ = m.group(1)
@@ -168,7 +172,9 @@ def slack_channel_messages(d, channel_name, users, emoji_map, pins):
     for file in sorted(glob.glob(os.path.join(channel_dir, "*.json"))):
         with open(file, "rb") as fp:
             data = json.load(fp)
-        for d in sorted(data, key=lambda x: x["ts"]):
+        for d in sorted(data, key=ts_fun):
+            if not "text" in d:
+                continue
             text = d["text"]
             text = MENTION_RE.sub(mention_repl, text)
             text = LINK_RE.sub(lambda x: x.group(1), text)

I ended up using this workaround and it fixed the problem.

pR0Ps commented 10 months ago

@damonseeley Thanks for sending me some sample data. The latest version (v1.1.6) has the fix in it.