rusq / slackdump

Save or export your private and public Slack messages, threads, files, and users locally without admin privileges.
GNU General Public License v3.0
1.49k stars 71 forks source link

Ignoring errors #109

Closed pawelgnatowski closed 1 year ago

pawelgnatowski commented 2 years ago

Describe the bug while bugs are encountered, export process terminates To Reproduce Steps to reproduce the behavior:

  1. Run slackdump like this '...' slackdump -f -export MyDump @channelList.txt Expected behavior Continue or prompt if to ignore error. Output 2022/08/16 13:53:39 application error: export error: failed to dump "xyz" (xxx): callback error: failed to dump channel xxx: strconv.Atoi: parsing "null": invalid syntax

Desktop (please complete the following information):

Otherwise gj bro!

rusq commented 2 years ago

Hey @pawelgnatowski , thank you for your report and glad that you found this tool useful.

I'll see what I can do with this.

rusq commented 2 years ago
  1. It seems that this error is propagated from the slack-go/slack library (there's no place in the slackdump code, where that calls the Atoi) and is related to JSONTime type. That's bad news, because if it was our code, it would be possible to omit the particular field value, with slack library, we'll need to sacrifice the whole batch of messages (ConversationsPerReq is the maximum, but API may return less, depending on it's mood) without complex iterative logic which would decrease the batch size and retry the API call until it, eventually, succeeds, therefore identifying the failing message in the batch. Of course, losing, say, 100 message in a batch is not a tragedy, comparing to losing all channel data.
  2. Can you confirm that the command line in the issue is correct? The command line refers to "export" mode, while this error can only be returned if running in "conversation dump" mode. I will implement the "ignore errors" flag globally, asking out of interest.
pawelgnatowski commented 2 years ago

You may be right - i actually have tried full export first but due to another error and sheer size of 5k channels i wanted to limit the amount of channels, the only way i found is by providing channel list. BTW. Is there any way to get saved-items (kind of starred elements?) and mentions and reactions - this is how i build my MVP list which i do not see exported anywhere.

Command: slackdump.exe -f -export xxx @channels.txt

rusq commented 2 years ago

Ah, it makes sense now. Re starred items - no, for now Slackdump is quite simple - only gets channels, users, and conversations.

  1. Starred items (i just checked) is a separate api call. It's actually a very good feature suggestion.
  2. There's no dedicated API to get mentions, as they're just a markup within the message,
  3. but there's a reactions.list API endpoint which, similar to starred items, can be used to get all items that the user has reacted on. Will place it in the TODO list as well :)
pawelgnatowski commented 2 years ago

sounds good, any ETA on the 1,3 - need to know how dirty i need to make my hands, as time window is closing fast. btw - i tried slack export viewer - i guess it is either full export or it does not work :( but that i guess could tackle at a later time... damn i love Slack.

rusq commented 2 years ago

Sorry, no ETA on this - i do it in my free time, features are plentiful, and I got only two hands 😂 But I'll see what I can do

rusq commented 2 years ago

When I released it open source I hoped that there'd be people contributing, as it seems to be helpful, but I guess the time hasn't come yet.

pawelgnatowski commented 2 years ago

wish i could - haven't picked up Go yet.

rusq commented 2 years ago

That's no problem, Pawel :) Feature suggestion or bug report are also great contributions, feedback loop is very important.

pawelgnatowski commented 2 years ago

i definitely must go for 1 & 3 which means i'll probably use node or python for it. Can share some lessons learned for the APIs you have mentioned. Thanks for doing this project. Kudos!

rusq commented 2 years ago

Thank you :)

rusq commented 2 years ago

@pawelgnatowski I have made a quick and not so dirty patch to my fork of the slack library that will treat "null" as zero unix time (Jan 1,1970), and built a windows binary out of it,(attached slackdump-2.1.2-null-fix.zip ). Could you please try it on the problematic channel and see if that works?

rusq commented 2 years ago

Also, I don't think that ignoring this kind of errors will work - each api call returns a next token, that might be nil in case that the error occurs, so it would not be possible to get the next "page". Let's see if the quickfix from my previous comment works.

Regarding 503 and the server-class errors in general, I think it would be possible to handle it along with 429 rate limit errors, but in this case we'd have to wait at increasing time intervals, i.e. 1st attempt fails, wait 30 seconds, next attempt fails, wait 60 seconds, etc.

pawelgnatowski commented 2 years ago

@pawelgnatowski I have made a quick and not so dirty patch to my fork of the slack library that will treat "null" as zero unix time (Jan 1,1970), and built a windows binary out of it,(attached slackdump-2.1.2-null-fix.zip ). Could you please try it on the problematic channel and see if that works? can't download it, blocked download - virus detected ;]

pawelgnatowski commented 2 years ago

Also, I don't think that ignoring this kind of errors will work - each api call returns a next token, that might be nil in case that the error occurs, so it would not be possible to get the next "page". Let's see if the quickfix from my previous comment works.

Regarding 503 and the server-class errors in general, I think it would be possible to handle it along with 429 rate limit errors, but in this case we'd have to wait at increasing time intervals, i.e. 1st attempt fails, wait 30 seconds, next attempt fails, wait 60 seconds, etc.

  1. ok - what if you change pagination size to mby ommit offending message and still get a token - or am I missing something?
  2. sounds reasonable.
rusq commented 2 years ago

can't download it, blocked download - virus detected ;] https://www.virustotal.com/gui/file/089be6d45ee681e8936d8d7b98c2e471d5e3bea887b509e0e7b332d592585388?nocache=1

Seems like a false positive? Anyway, I can understand the lack of trust.

Here are the changes in slackdump: https://github.com/rusq/slackdump/compare/master...i109-qf

and here are the changes in the slack lib fork: https://github.com/rusq/slack/compare/master...null-time

Would you be able to checkout and build branch i109-qf on your machine and check if that works?

pawelgnatowski commented 2 years ago

Hey, not about trust, just literally it was blocked by browsers. I will be back in a couple of days and will try then.

pawelgnatowski commented 2 years ago

gave the zip file another try and it works ^_^, guess M$ updated defender or smth. tried the faulty channel again: 2022/08/27 10:51:58 error saving "FGHGC2XFG-" to "xxx\attachments": callback error: download to "xxx\attachments\FGHGC2XFG-" failed, [src=]: received empty download URL process continues though... <3

rusq commented 2 years ago

Thanks! Looks like there's some malformed file within that channel - there's an ID of this file ("FGHGC2XFG"), but no name, no URL etc. Very strange. But glad to hear that it works, I basically modified the slack library to ignore empty JSONTime. I'll submit the PR to upstream slack library. If that doesn't get through, i'll just maintain the change in the fork.

rusq commented 2 years ago

I have prepared a tool for #115, that shows the RAW output of the API - can I ask you to run it on that channel, and copy/paste the JSON for that file object with "ID": "FGHGC2XFG". Would be interesting to see what in the actual fuck is going on over there? rawoutput.zip

It uses the same auth as the slackdump, so you could run it like this:

rawoutput.exe channel_id

it will generate the slackdump_raw.log file which is a dump of headers and JSON output from the API - could you please search for FGHGC2XFG, and paste the surrounding json object in this thread? Most likely it will be empty, but if it contains some identifiable information, i.e. slack workspace name, it would make sense to obfuscate it, or replace with meaningless strings. I'd be keen to see what fields of that malformed file are populated and which are not.

pawelgnatowski commented 2 years ago

{ "type":"message", "text":"ZZZZ)\n\nYYY\n\nHHH?", "files":[ { "id":"FGHGC2XFG", "mode":"tombstone" } ], "upload":true, "user":"XXX7CK4GK", "display_as_bot":false, "ts":"1551261378.012600", "thread_ts":"1551261378.012600", "reply_count":2, "reply_users_count":2, "latest_reply":"1551262603.013800", "reply_users":[ "XXX7CK4GK", "XXX1654PP" ], "is_locked":false, "subscribed":false }

rusq commented 2 years ago

Very interesting - it looks like it's a "deleted remote file" according to the this doc

Probably they are so rare, that no one ever had this special case with the slack lib. I searched through their issues and was unable to find anything on this.

Thank you!

rusq commented 2 years ago

TODO:

rusq commented 2 years ago

@pawelgnatowski I was trying to reproduce this the other day, the same way I did with #119 (the test code is in the issue I've opened with slack lib https://github.com/slack-go/slack/issues/1104), however I did not get the unmarshal error, until I've added a "timestamp":null piece to the file.

  1. If you still have the raw_output file that was generated, could you please search it for the string "null"?
  2. If it's there, could you please post it the way you did last time with the PII removed, so I could use it to open another issue with the slack lib?

Thank you!

pawelgnatowski commented 2 years ago

{ "type": "message", "text": "We're starting a data science community ", "files": [ { "id": "FSXXX79LN", "created": 1573757704, "timestamp": null, "name": "Data_Science_Community_of_Practice", "title": "Data Science Community", "mimetype": "application\/vnd.slack-docs", "filetype": "docs", "pretty_type": "Arugula", "user": "UXX617GTY", "editable": true, "size": 8886, "mode": "docs", "is_external": false, "external_type": "", "is_public": true, "public_url_shared": false, "display_as_bot": false, "username": "", "url_private": "https:\/\/files.slack.com\/files-pri\/T0XXX3EC-FSXXX79LN\/data_science_community_of_practice", "url_private_download": "https:\/\/files.slack.com\/files-pri\/T0XXX3EC-FSXXX79LN\/download\/data_science_community_of_practice", "permalink": "https:\/\/myteam.slack.com\/files\/T0XXX3EC\/FSXXX79LN", "permalink_public": "https:\/\/slack-files.com\/T0XXX3EC-FSXXX79LN-0733464a3f", "preview": "<p><br><br>We're staring a data science community of practicexxxxxxxxxxx<br><br><br><\/p>", "editor": null, "last_editor": null, "non_owner_editable": null, "updated": null, "is_starred": false, "has_rich_preview": false } ], "upload": true, "user": "UXX617GTY", "display_as_bot": false, "ts": "1561469761.006800", "thread_ts": "1561469761.006800", "reply_count": 12, "reply_users_count": 12, "latest_reply": "1562083633.018200", "reply_users": [ "UDVYYY3CH", "UDQYYYHGB", "UEJYYYG5Q", "UDZYYYPJR", "UERYYYZ6H", "U20YYY4UB", "UF7YYYFG9", "UCRYYYLUB", "UDCYYYYV8", "UE4YYY3S7", "UCXYYYYR1", "UE2YYYYNA" ], "is_locked": false, "subscribed": false }

rusq commented 2 years ago

Excellent, thank you! Reproduced straight away!

_experiments/slack/bug109$ go run .
2022/08/31 17:26:07 strconv.Atoi: parsing "null": invalid syntax
exit status 1
rusq commented 2 years ago

Created an issue https://github.com/slack-go/slack/issues/1107 and PR https://github.com/slack-go/slack/pull/1106 for the upstream library.

pawelgnatowski commented 2 years ago

Btw. The stars and reactions API is super straight forward Added team and user ids and got what i needed. Super easy! Thanks for the tip!

rusq commented 2 years ago

Hey @pawelgnatowski , sorry, I was too focused on the API issue, and the reactions and bookmarks completely slipped my mind. I'll create a separate issue for those, not to lose track.

pawelgnatowski commented 2 years ago

No prob, like you said, you do it when you do it. I used your suggestions and just went to Slack api pages and voila. Anyway, maybe you know of a good way to browse and search the dump? Would be awesome to also get full text search, also with docs, ppt etc. Any suggestions/ideas?

mootari commented 2 years ago

Any suggestions/ideas?

@pawelgnatowski Have a look at this discussion: https://github.com/rusq/slackdump/discussions/127

rusq commented 1 year ago

Merged the upstream slack library.