Closed pawelgnatowski closed 1 year ago
Hey @pawelgnatowski , thank you for your report and glad that you found this tool useful.
I'll see what I can do with this.
ConversationsPerReq
is the maximum, but API may return less, depending on it's mood) without complex iterative logic which would decrease the batch size and retry the API call until it, eventually, succeeds, therefore identifying the failing message in the batch. Of course, losing, say, 100 message in a batch is not a tragedy, comparing to losing all channel data.You may be right - i actually have tried full export first but due to another error and sheer size of 5k channels i wanted to limit the amount of channels, the only way i found is by providing channel list. BTW. Is there any way to get saved-items (kind of starred elements?) and mentions and reactions - this is how i build my MVP list which i do not see exported anywhere.
Command:
slackdump.exe -f -export xxx @channels.txt
Ah, it makes sense now. Re starred items - no, for now Slackdump is quite simple - only gets channels, users, and conversations.
sounds good, any ETA on the 1,3 - need to know how dirty i need to make my hands, as time window is closing fast. btw - i tried slack export viewer - i guess it is either full export or it does not work :( but that i guess could tackle at a later time... damn i love Slack.
Sorry, no ETA on this - i do it in my free time, features are plentiful, and I got only two hands 😂 But I'll see what I can do
When I released it open source I hoped that there'd be people contributing, as it seems to be helpful, but I guess the time hasn't come yet.
wish i could - haven't picked up Go yet.
That's no problem, Pawel :) Feature suggestion or bug report are also great contributions, feedback loop is very important.
i definitely must go for 1 & 3 which means i'll probably use node or python for it. Can share some lessons learned for the APIs you have mentioned. Thanks for doing this project. Kudos!
Thank you :)
@pawelgnatowski I have made a quick and not so dirty patch to my fork of the slack library that will treat "null" as zero unix time (Jan 1,1970), and built a windows binary out of it,(attached slackdump-2.1.2-null-fix.zip ). Could you please try it on the problematic channel and see if that works?
Also, I don't think that ignoring this kind of errors will work - each api call returns a next token, that might be nil in case that the error occurs, so it would not be possible to get the next "page". Let's see if the quickfix from my previous comment works.
Regarding 503 and the server-class errors in general, I think it would be possible to handle it along with 429 rate limit errors, but in this case we'd have to wait at increasing time intervals, i.e. 1st attempt fails, wait 30 seconds, next attempt fails, wait 60 seconds, etc.
@pawelgnatowski I have made a quick and not so dirty patch to my fork of the slack library that will treat "null" as zero unix time (Jan 1,1970), and built a windows binary out of it,(attached slackdump-2.1.2-null-fix.zip ). Could you please try it on the problematic channel and see if that works? can't download it, blocked download - virus detected ;]
Also, I don't think that ignoring this kind of errors will work - each api call returns a next token, that might be nil in case that the error occurs, so it would not be possible to get the next "page". Let's see if the quickfix from my previous comment works.
Regarding 503 and the server-class errors in general, I think it would be possible to handle it along with 429 rate limit errors, but in this case we'd have to wait at increasing time intervals, i.e. 1st attempt fails, wait 30 seconds, next attempt fails, wait 60 seconds, etc.
can't download it, blocked download - virus detected ;] https://www.virustotal.com/gui/file/089be6d45ee681e8936d8d7b98c2e471d5e3bea887b509e0e7b332d592585388?nocache=1
Seems like a false positive? Anyway, I can understand the lack of trust.
Here are the changes in slackdump: https://github.com/rusq/slackdump/compare/master...i109-qf
and here are the changes in the slack lib fork: https://github.com/rusq/slack/compare/master...null-time
Would you be able to checkout and build branch i109-qf
on your machine and check if that works?
Hey, not about trust, just literally it was blocked by browsers. I will be back in a couple of days and will try then.
gave the zip file another try and it works ^_^, guess M$ updated defender or smth. tried the faulty channel again: 2022/08/27 10:51:58 error saving "FGHGC2XFG-" to "xxx\attachments": callback error: download to "xxx\attachments\FGHGC2XFG-" failed, [src=]: received empty download URL process continues though... <3
Thanks! Looks like there's some malformed file within that channel - there's an ID of this file ("FGHGC2XFG"), but no name, no URL etc. Very strange. But glad to hear that it works, I basically modified the slack library to ignore empty JSONTime. I'll submit the PR to upstream slack library. If that doesn't get through, i'll just maintain the change in the fork.
I have prepared a tool for #115, that shows the RAW output of the API - can I ask you to run it on that channel, and copy/paste the JSON for that file object with "ID": "FGHGC2XFG". Would be interesting to see what in the actual fuck is going on over there? rawoutput.zip
It uses the same auth as the slackdump, so you could run it like this:
rawoutput.exe channel_id
it will generate the slackdump_raw.log
file which is a dump of headers and JSON output from the API - could you please search for FGHGC2XFG, and paste the surrounding json object in this thread? Most likely it will be empty, but if it contains some identifiable information, i.e. slack workspace name, it would make sense to obfuscate it, or replace with meaningless strings. I'd be keen to see what fields of that malformed file are populated and which are not.
{ "type":"message", "text":"ZZZZ)\n\nYYY\n\nHHH?", "files":[ { "id":"FGHGC2XFG", "mode":"tombstone" } ], "upload":true, "user":"XXX7CK4GK", "display_as_bot":false, "ts":"1551261378.012600", "thread_ts":"1551261378.012600", "reply_count":2, "reply_users_count":2, "latest_reply":"1551262603.013800", "reply_users":[ "XXX7CK4GK", "XXX1654PP" ], "is_locked":false, "subscribed":false }
Very interesting - it looks like it's a "deleted remote file" according to the this doc
Probably they are so rare, that no one ever had this special case with the slack lib. I searched through their issues and was unable to find anything on this.
Thank you!
TODO:
@pawelgnatowski I was trying to reproduce this the other day, the same way I did with #119 (the test code is in the issue I've opened with slack lib https://github.com/slack-go/slack/issues/1104), however I did not get the unmarshal error, until I've added a "timestamp":null
piece to the file.
Thank you!
{ "type": "message", "text": "We're starting a data science community ", "files": [ { "id": "FSXXX79LN", "created": 1573757704, "timestamp": null, "name": "Data_Science_Community_of_Practice", "title": "Data Science Community", "mimetype": "application\/vnd.slack-docs", "filetype": "docs", "pretty_type": "Arugula", "user": "UXX617GTY", "editable": true, "size": 8886, "mode": "docs", "is_external": false, "external_type": "", "is_public": true, "public_url_shared": false, "display_as_bot": false, "username": "", "url_private": "https:\/\/files.slack.com\/files-pri\/T0XXX3EC-FSXXX79LN\/data_science_community_of_practice", "url_private_download": "https:\/\/files.slack.com\/files-pri\/T0XXX3EC-FSXXX79LN\/download\/data_science_community_of_practice", "permalink": "https:\/\/myteam.slack.com\/files\/T0XXX3EC\/FSXXX79LN", "permalink_public": "https:\/\/slack-files.com\/T0XXX3EC-FSXXX79LN-0733464a3f", "preview": "<p><br><br>We're staring a data science community of practicexxxxxxxxxxx<br><br><br><\/p>", "editor": null, "last_editor": null, "non_owner_editable": null, "updated": null, "is_starred": false, "has_rich_preview": false } ], "upload": true, "user": "UXX617GTY", "display_as_bot": false, "ts": "1561469761.006800", "thread_ts": "1561469761.006800", "reply_count": 12, "reply_users_count": 12, "latest_reply": "1562083633.018200", "reply_users": [ "UDVYYY3CH", "UDQYYYHGB", "UEJYYYG5Q", "UDZYYYPJR", "UERYYYZ6H", "U20YYY4UB", "UF7YYYFG9", "UCRYYYLUB", "UDCYYYYV8", "UE4YYY3S7", "UCXYYYYR1", "UE2YYYYNA" ], "is_locked": false, "subscribed": false }
Excellent, thank you! Reproduced straight away!
_experiments/slack/bug109$ go run .
2022/08/31 17:26:07 strconv.Atoi: parsing "null": invalid syntax
exit status 1
Created an issue https://github.com/slack-go/slack/issues/1107 and PR https://github.com/slack-go/slack/pull/1106 for the upstream library.
Btw. The stars and reactions API is super straight forward Added team and user ids and got what i needed. Super easy! Thanks for the tip!
Hey @pawelgnatowski , sorry, I was too focused on the API issue, and the reactions and bookmarks completely slipped my mind. I'll create a separate issue for those, not to lose track.
No prob, like you said, you do it when you do it. I used your suggestions and just went to Slack api pages and voila. Anyway, maybe you know of a good way to browse and search the dump? Would be awesome to also get full text search, also with docs, ppt etc. Any suggestions/ideas?
Any suggestions/ideas?
@pawelgnatowski Have a look at this discussion: https://github.com/rusq/slackdump/discussions/127
Merged the upstream slack library.
Describe the bug while bugs are encountered, export process terminates To Reproduce Steps to reproduce the behavior:
slackdump -f -export MyDump @channelList.txt
Expected behavior Continue or prompt if to ignore error. Output 2022/08/16 13:53:39 application error: export error: failed to dump "xyz" (xxx): callback error: failed to dump channel xxx: strconv.Atoi: parsing "null": invalid syntaxDesktop (please complete the following information):
Otherwise gj bro!