tools.target: don't clobber existing permissions in `Channel.add_user()`

dgw commented 7 months ago

Description

Here's a potential fix for #2562, including a slightly modified version of the test case @half-duplex wrote for that issue.

The race-condition feeling I get from the bug report is a strong one. I can only assume that the IRCd on which this was reported (Unreal 3.2, which is admittedly EOL) suffers from a race condition between its who and mode modules, something like:

New user joins #channel
IRCd notifies all #channel members, including Sopel and ChanServ
Sopel immediately sends MODE and WHO for the newly joined channel
- Near-simultaneously, ChanServ notices that the new user should be auto-opped and sends MODE +o
The who module finds the user, builds the status prefix, and sends the reply to Sopel
- Meanwhile, the mode module (possibly on a different server in the network, adding more delay) processes (and possibly has to relay) the MODE change sent by ChanServ
By the time the MODE change is processed/received, Sopel's server has already prepared the WHO response with an outdated status field, so Sopel gets a reply that doesn't indicate the new user's chanop privileges

Lots of educated guesses there based on skimming the C source for Unreal 3.2, but even if the exact sequence is wrong the result walks/quacks like a race condition, so we have to treat it like one.

I've also bundled a tweak to coretasks, moving the duplicate MODE and WHO invocations for a new channel to a single location (new helper function) and adding some more commentary on the join-queue logic, simply because those tweaks don't warrant a pull request of their own.

Checklist

[x] I have read CONTRIBUTING.md
[x] I can and do license this contribution under the EFLv2
[x] No issues are reported by make qa (runs make lint and make test)
[x] I have tested the functionality of the things this change touches

Notes

No milestone up front; I'm not positive that we should shoehorn this particular issue & patch into 8.0.0. Depends on the reaction from other maintainers whether it goes in 8.0.0, 8.0.1, or 8.1.0. 😉

dgw commented 7 months ago

Ah, well, maybe this isn't a good solution. It depends on whether the behavior of Channel.add_user() replacing any data that already exists is important.

half-duplex commented 7 months ago

add_user is called in three places: JOIN, RPL_NAMREPLY, and _record_who(). Could add a clear=False to add_user and call it with clear=True from JOIN? Idk, all of these feel gross when NAMREPLY and WHOREPLY should be absolute and trustworthy.

I guess whether we do something like this or consider it an Unreal bug depends on whether it's likely to happen elsewhere. The possibility of different threads — or even servers with link latency — answering things makes it feel not entirely unlikely. The network where I originally observed this has several servers.

dgw commented 7 months ago

Idk, all of these feel gross when NAMREPLY and WHOREPLY should be absolute and trustworthy.

And that's why I initially didn't even plan to work on this. The patch here was an experiment, and the only reasonable way to put it "out there" for discussion was to open a PR. 🤷‍♂️

add_user is called in three places: JOIN, RPL_NAMREPLY, and _record_who(). Could add a clear=False to add_user and call it with clear=True from JOIN?

That could be an interesting line of discussion, though I think it's less about whether to clear (clear what is not obvious from the kwarg name) and more about "was this user directly observed JOINing or not?"

If we can reasonably trust NAMREPLY/WHOREPLY for users previously in the channel when Sopel joins, and a race condition is only a concern for users who join after Sopel, we can possibly develop sensible logic. But it all still comes down to whether we can trust the server's NAMREPLY/WHOREPLY data, and none of this overthinking should be needed because ultimately, what the server says is supposed to be trustworthy.

(If I come off as frustrated, it's because I am. Inability to trust state expressed in the server's responses could break a lot more than privilege tracking.)

half-duplex commented 7 months ago

Set a flag for "we have a WHO out" on the User object, and if anything weird (MODE) happens before ENDOFWHO for it, re-WHO?

dgw commented 7 months ago

Hmm, I'm going to take this discussion back to the original issue. This doesn't seem to be the way forward.

sopel-irc / sopel