slack-go / slack

Slack API in Go, originally by @nlopes; Maintainers needed, contact @parsley42
https://pkg.go.dev/github.com/slack-go/slack
BSD 2-Clause "Simplified" License
4.63k stars 1.12k forks source link

Safer socketmode #1150

Closed iaburton closed 1 year ago

iaburton commented 1 year ago

PR preparation

Run make pr-prep from the root of the repository to run formatting, linting and tests.

Done, though I only modified socketmode and I was running race tests with -count 1000 while making changes anyways.

Should this be an issue instead

Eh, one of the things it fixes was already an issue, #1125 as mentioned in the relevant commit message. I decided not to make a(nother) larger issue for my other changes, for reasons listed below.

API changes

As far as I can tell, there should be no breaking api changes (though a small behavior change in one location that used to have the potential to panic). There are however small API "additions" in terms of two methods that allow context use while previous versions did not.

The why

Hello!

I'm looking to use the socketmode package of this repo for a potentially largish app and while browsing the code & existing issues decided to fix some up myself as well as address some I found on my own. Below is a TLDR-ish summary of the changes:

#### DRAFT Status While I'm technically submitting this now, I don't consider it 100% yet as I'd like to verify some things in more real-world use, plus I'm considering performance and other changes, though those are likely to come in another PR. That, and it lets me get some feedback on this early if any maintainers have comments/questions/etc. :slightly_smiling_face: EDIT: Avoiding scope creep and stale MRs; performance and other changes can be followed up.

iaburton commented 1 year ago

Hi @kanata2

I know I haven't followed up with this as I intended. Unfortunately some other things came up and I haven't been able to add-to or otherwise refine this MR further.

However that doesn't mean it cannot go in as-is and follow up with another if needed. Have you or any other devs/maintainers had a chance to look it over?

kanata2 commented 1 year ago

@iaburton Sorry for slow response. I'll confirm later. 🙏

FYI: #1125 is fixed by #1170

iaburton commented 1 year ago

Hey @kanata2 , no problem on the slow response.

Btw, I've functionally reverted #1170 in favor of what I had already fixed here (including that panic), in unison with the other issues I've described above.

I would appreciate it if you, any other maintainers, and those involved with 1170 ( @lololozhkin & @parsley42 ) double check, as the scope of this MR is creeping which I would rather it not do :slightly_smiling_face:

Further, I noticed another race I've yet to (but can) fix on the deadman timer. Atm it has a 'fixme' if we want to merge this first and let someone else tackle it, or I can in a follow up (or here if desired). EDIT: Just went ahead and fixed the timer issue, look at relevant commit and OP for information.

parsley42 commented 1 year ago

@iaburton FYI, I'm running your branch in production Gopherbot right now. I've reached out to @kanata2 , but it may be that he's gotten too busy with other things to devote much time to this.

I've been reading your patch for a while now this morning, but can't really take the time to fully ingest it (I'm not a daily go programmer, just a hobbyist w/ Gopherbot, which helps me with my real day job, DevOps). Your thinking and explanations are clear and sensible, so my inclination is to approve/merge as this fix covers a potential deadlock and race, and has a nicer implementation for the timeout. I'm going to wait a few more days to hear back from Naoki, though.

My other inclination is to inquire if you'd be willing to serve as a maintainer. Looking at the commit history, Naoki has been doing most of the reviews and merges. Let me know if you'd be up for helping out.

iaburton commented 1 year ago

@iaburton FYI, I'm running your branch in production Gopherbot right now. I've reached out to @kanata2 , but it may be that he's gotten too busy with other things to devote much time to this.

I've been reading your patch for a while now this morning, but can't really take the time to fully ingest it (I'm not a daily go programmer, just a hobbyist w/ Gopherbot, which helps me with my real day job, DevOps). Your thinking and explanations are clear and sensible, so my inclination is to approve/merge as this fix covers a potential deadlock and race, and has a nicer implementation for the timeout. I'm going to wait a few more days to hear back from Naoki, though.

My other inclination is to inquire if you'd be willing to serve as a maintainer. Looking at the commit history, Naoki has been doing most of the reviews and merges. Let me know if you'd be up for helping out.

Hi @parsley42 thanks for running my branch! Pardon my absence as I've been dealing with a few things IRL :slightly_smiling_face:

Before I comment on helping with maintenance may I ask how the branch has fared w/ Gopherbot? Is there anything I can help explain w/ respect to you having looked over the patch?

parsley42 commented 1 year ago

Hi @iaburton yeah, work and RL are keeping me busy, too, as well as Naoki I guess.

The original issue (IIRC) was that when the library stopped getting a ping from slack that wasn't sufficient to generate a reconnect - instead nothing happened until the ReadJSON timed out, which took a couple of minutes, causing a long pause when a connection went bad. The first bad fix just stopped waiting on the ReadJSON, and errored out on missed ping. I didn't see how this could lead to a panic later.

Right now I'm heavily in AWS/Terraform land, and it would be a major context switch to fully grok your patch; I've been reading it for 20 minutes (about all I can allot) and still don't completely follow your changes in e.g. receiveMessagesInto - but since you asked, if you wouldn't mind expanding on how this patch reconnects when pings are missed, but avoids the panic related to the ReadJSON, that might help.

In any case, as far as Gopherbot goes, I can only tell you that two of my production robots haven't panicked in the last two weeks, and I'm not seeing any issues. When I looked at your GH profile, it appeared you spend a lot of time writing Go, so I thought maybe this would be a good fit - if not, I'll continue examining contributors to see who else might be able to pitch-in.

What we don't want is me trying to review and merge patches. 😬

parsley42 commented 1 year ago

@iaburton Ok, it looks like I'm going to need to fill in for a while. I'd like the checks to run, but I think you'll need to close and re-open the PR to get them out of the expired state.

parsley42 commented 1 year ago

Closing and re-opening for tests.

parsley42 commented 1 year ago

@lololozhkin Have you looked at this patch? It deals with the same issue as your patch, but with some additions and changes. Wanted to get your thoughts on it?

lololozhkin commented 1 year ago

@parsley42, hi, haven't seen your messages, thank you for mentioning. Cool patch, I haven't seen the opportunity to not to wait one of the goroutines, because I didn't know that this goroutine was the only producer, well done!

I am not maintainer of this repo, so I cannot approve it on github, so I can do it only in comments. Approve)

Yes, this code fixes my issue too. It's a pity, that my code goes to trash, but this project doesn't need it)

lololozhkin commented 1 year ago

I've read this discussion, and haven't understand, does my patch fixed your problem with slow reconnects? Cause, my patch fixed panics, It's running in my production and 0 panics occurred for more over than a month. And yes, reconnects are handled normally.

If ping is not received till deadman timer elapsed, context will be cancelled and propagated to json reading, the goroutine will return after ctx cancel.

Am I missing something?

parsley42 commented 1 year ago

@lololozhkin Sorry - more context; this patch is by @iaburton , not me. I am a maintainer and I could merge this patch, but wanted your opinion since it is an alternate fix. Based on the description, it may have value, even if only allowing the caller to pass in a context. What do you think?

lololozhkin commented 1 year ago

@parsley42 As I said, yes, from my point of view that patch fixes the problem from my PR. Moreover it fixes it in a better way, as I suppose. So, from my point of view, this PR should be merged fully, including allowing usage of contexts in Send and Ack, and fix of panics on reconnects :)

parsley42 commented 1 year ago

@lololozhkin Ok, thanks - you approved in your first message but then replied again with other questions, so I just wanted to clarify. Thanks for being responsive. I am in the process of trying to recruit more active maintainers for this library and have sent you an invite - I think more seasoned Go developers probably wouldn't pick me as "the" maintainer, but right now it seems others are busy, so I'm trying to get some things moved through. If you'd like to help out, that'd be great; if you know other devs you'd recommend I'd be happy to hear.

lololozhkin commented 1 year ago

@parsley42 TY! Accepted your invitation, I think I could be usefull for this project :)

I don't know other interested go devs, but if they appear in my life, I would tell it.