grundleborg / slack-advanced-exporter

A tool for exporting additional data from Slack that is missing from the official data export.
MIT License
95 stars 18 forks source link

file_share post has missing properties on its File object #30

Closed kugelblitz closed 1 year ago

kugelblitz commented 1 year ago

Hello,

I run slack-advanced-exporter as follows

./slack-advanced-exporter --input-archive export-with-emails.zip --output-archive export-with-emails-and-attachments.zip fetch-attachments --api-token xoxb-.......

I believe I did all the necessary steps. The program works, but most of files (not all, but most) are not downloaded. Here is an error message I am seeing, one of many:

2023/03/11 23:11:36 ++++++ file_share post has missing properties on its File object: 1634467808.036600

The previous step (extraction of emails) went without errors.

I tried both the 0.4 release and the current version build from HEAD.

Any clues will be much appreciated.

gergoradeczki commented 1 year ago

Are you on the free tier of Slack?

kugelblitz commented 1 year ago

Yes.

gergoradeczki commented 1 year ago

I believe Slack has disabled access to older file attachments if you are on a free tier. I remember that back in 2022. Sept. 12. this tool was working as intended and downloaded all the attachments. But now, I ran into the same problem as you.

A possible workaround could be that you request a trial to a paid tier and then do the export/import.

Or the tool just broke because they changed something on Slack's end.

kugelblitz commented 1 year ago

OK, thank you for getting in touch!

kugelblitz commented 1 year ago

I looked into this some more. It seems that the problem is not the "age" of attachments. Some old attachments were downloaded successfully. The problem is that most file links in the JSON files (url_private and url_private_download) do not have the token in them. Those who do have a token are downloaded fine. (It is actually the same token in all URLs.) If I manually attach the token to an URL that does not have it, I can download file via wget without trouble.

Given the circumstances, it would be extremely useful to have some option like --file-token in the program, so the user could pass the token to use for URLs that do not have it by some reason.

kugelblitz commented 1 year ago

Oh, this has turned out to be more complicated than I thought. In fact, most of the files in the "files" blocks are not downloaded because they have their URLs absent in JSON and have "mode": "hidden_by_limit" Those are "files" blocks that are children of message blocks.

The URLs without tokens that I saw are in the "files" blocks that are children of "attachments" blocks that are children of "files" blocks. They are simply ignored by the program. But they are still valid and useful URLs (and downloadable if one attaches the token to them).

gergoradeczki commented 1 year ago

That's a good find. Sadly, I'm not a maintainer or anything. I just stumbled upon this error as well. In my export, many of the files have the same structure (older than 90 days):

"files": [
    {
        "id": "XXXXXXXXXXX",
        "mode": "hidden_by_limit"
    }
]

This lead me to believe Slack must have changed something on their backend. I can remember that this was not present in 2022. Sept. 12., and I can prove that because I still have access to this export file. This could have been an error on their end and just patched it later.

Not older than 90 days uploads have a more workable json:

"files": [
    {
        "id": "",
        ...
        "url_private": "",
        "url_private_download": "",
        ...
    }
]

Side note:

Pinned messages are treated differently, they are still accessible even after 90 days (or the previous 10.000 message limit).

gergoradeczki commented 1 year ago

A download link looks like the following:

https://files.slack.com/files-pri/<TEAM_ID>-<FILE_ID>/download/<FILE_NAME>?t=<TOKEN>

Where:

If you have these informations, then you can reconstruct the download URL. At the moment, it is still possible to download a file older than 90 days this way IF you have access to its download link.

kugelblitz commented 1 year ago

Yes, the FILE_NAME not known for sure is a problem...

Anyway, in my case we managed to elevate the subscription plan for a short time and successfully downloaded all attachments.