pR0Ps / slack-to-discord

Import a Slack export into a Discord server
https://pypi.org/project/slack-to-discord/
76 stars 11 forks source link

Message formatting is lost when splitting large messages #25

Open zzzoom opened 1 year ago

zzzoom commented 1 year ago

When importing a large formatted message like a code block, the message splitter doesn't terminate and resume formatting on each resulting message so the format is lost, i.e. something like:

long code block 1/2

long code block 2/2

Gets converted into:

``` long code block 1/2


long code block 2/2 ```

pR0Ps commented 4 months ago

I think handling this specific case is not too hard (use a regex to match ``` blocks, if it intersects with a message boundary then upload the block as an attachment or something), but the more general problem of "how to split text while preserving formatting" is a bit more complex.

For example, if the text **this is some *strong* text** is split in the middle of strong, it's the same kind of issue (it needs to be split like **this is some *str*** | ***ong* text**). Another example is links - they can't be split in the middle. Like parsing HTML, this isn't something that can be done generically with a regex. Properly handling these cases this will depend on understanding the actual syntax of the text and what types of entities can be split. Slack does provide structured information in the export using its concept of "blocks" so the proper way to implement this will most likely involve parsing all that. It could then be used to intelligently pick a split point to avoid breaking non-splittable entities, inserting trailing syntax markers before the split point, and starting syntax markers after the split point.