ircv3 / ircv3-ideas

46 stars 3 forks source link

store hints #68

Closed slingamn closed 3 years ago

slingamn commented 4 years ago

Right now, Oragono manually blacklists TAGMSG that carry only draft/typing from being stored in history, because otherwise they would fill up the history buffers to the exclusion of more useful messages.

I originally suggested a prefix that could identify all such "transient" client-only tags. But @jwheare suggested XMPP's message processing hints, in particular no-store and no-permanent-store, as a better and more flexible precedent. Basically, if the client is emitting a TAGMSG that doesn't need to be stored, it could attach an additional client-only tag advising the server not to store the message.

Thoughts?

jwheare commented 4 years ago

I don't think I suggested XMPP, but I have been experimenting with a prefixed @ephemeral tag for this purpose, mainly to support ephemeral messages in our slack gateway: https://api.slack.com/messaging/managing#ephemeral#ephemeral

They have the main property that they're only shown to a specific user, e.g. in responses to bot commands, but we use the tag to skip storage.

slingamn commented 4 years ago

Do we have any use for the XMPP distinction between "don't store on disk" and "don't store at all"?

dequis commented 4 years ago

Oh, it was my suggestion.

I don't think we have any use case for no-permanent-store, we don't realy have the same concept of offline users that the server temporarily stores messages for.

no-copy is rather xmpp specific, their multi-client support is significantly more complex/messy/powerful/confusing than ours, and even if we care about the use case of "having a conversation that is only delivered to only one client when using a bouncer" the implementation of that would take way more than just a message procesing hint.

So, just no-store?

slingamn commented 4 years ago

I don't think we have any use case for no-permanent-store, we don't realy have the same concept of offline users that the server temporarily stores messages for.

My implementation has this functionality, but I'm not sure the "no-store" vs. "no-permanent-store" distinction is relevant to anything I'm doing.

Let's go with no-store for now (meaning "don't store at all"), and we can add a weaker variant later if there's a need. Is the next step here to create a spec PR that registers +draft/no-store as a client-only tag?

jwheare commented 4 years ago

I saw mention of a concern in IRC. What's to stop people from marking all their own messages as no-store (or whatever name we use)? Is that a problem? Kind of breaks a lot of expectations.

dequis commented 4 years ago

Yeah, that would be annoying, but less annoying than handling GDPR requests asking for removal of one user's messages in chat logs.

I guess it should be clear that it's a hint, and implementations are free to do any of

jwheare commented 4 years ago

Right but if you accept that people might do it and would like to prevent that, you're forced to ignore it completely, rendering it useless.

At some point you will either have too much stuff in log storage or you'll have missing information (even TAGMSG with react can count as storable information)

For me, I'd prefer to store everything first, then identify which things I can filter out, and that involves just looking at the tags and deciding "right I'll filter our typing notifications" rather than trusting users to tell me what I can ignore.

I think this issue as raised is more about reducing log cruft than about respecting privacy or GDPR, which is a whole other can of worms.

slingamn commented 4 years ago

I think this issue as raised is more about reducing log cruft than about respecting privacy or GDPR, which is a whole other can of worms.

I think this is exactly right, these issues are coexisting uneasily in this spec.

Suggestions (leaving it until later to translate into the proper SHOULDs and MAYs):

  1. Rename the tag to +transient, avoiding the suggestion that this is an instruction from the client not to store the message
  2. Recommend to servers that they honor it for TAGMSG and not on any other commands
  3. Make it clear to clients that there is no guarantee that this will affect data retention
slingamn commented 4 years ago

I guess if we're treating privacy / retention as a completely separate issue, we get back to my original idea of giving these tags a prefix. The advantage of a prefix, relative to a separate +transient tag, is that client compliance is automatic. With +transient, I suspect some popular client implementation will forget to attach it and then we'll be back to square one (trying to maintain a blacklist of transient tags).

dequis commented 4 years ago

I'm not a fan of the prefix idea because "slightly modifying log retention" is not a use case that is relevant enough to have it present all the time.

I'd just discard this idea entirely and add special cases for specific tags to the servers that care about log retention. It's not worth the hassle for anyone else.

slingamn commented 4 years ago

Discard this proposal entirely, as opposed to submitting a draft for +transient?

I'd really prefer not to play whack-a-mole with these tags (or to have server operators playing whack-a-mole with them).

slingamn commented 4 years ago

Another use case that has come up for this is E2EE messages (since in many architectures, the ephemeral key material needed to decrypt them gets discarded by the clients and the stored history messages are useless). We're looking at multiple different proposals for E2EE (different versions of OTR, MLS, and Megolm); it would be nice if we could recommend +transient for all of them.

slingamn commented 4 years ago

Two more cases:

  1. +autocomplete-request, as proposed here: https://github.com/ircv3/ircv3-specifications/pull/415
  2. Read receipts: either between two clients, or between a client and itself (to clear unread notifications across two clients connected to the same bouncer)

I anticipate more of these as time goes on.

progval commented 4 years ago

What's to stop people from marking all their own messages as no-store (or whatever name we use)? Is that a problem? Kind of breaks a lot of expectations.

I think it would be a great feature of IRC to allow opt-out from logging, it would make it stand apart from some other chat services. (But as dequis mentioned, it's still up to the implementation to follow it or not)

Some IRC logging services already provide ad-hoc ways for users to prevent their messages from being logged, eg. by prefixing the content of their messages with a - (I think it was Echelog?); this feature would hide this visual overhead.

slingamn commented 4 years ago

Summarizing some discussion from IRC: for the most part, I think we've moved away from using this spec as a privacy feature (i.e., as a way for users to control data retention). This is for two reasons: (a) if this were allowed for messages other than TAGMSG, it would disrupt the usability of CHATHISTORY (b) server compliance is not guaranteed.

Another thing to note is that none of these proposals addresses the possibility of abuse: under any of these proposals, a malicious client can fill up history buffers with junk TAGMSG of some kind. Abuse prevention is probably out of scope here --- the goal is to stop legitimate, non-malicious TAGMSG traffic from using up storage unnecessarily.

There are four main proposals at this point:

  1. +transient: add a new tag, +transient, that indicates that a TAGMSG should not be stored. a. Pro: explicit b. Con: requires opt-in compliance from client implementations (future client-only tag specifications will include a recommendation of the form "clients SHOULD send +transient along with these messages", and then it's up to the implementation to comply) c. Con: +typing has to be special-cased (i.e., +typing will entail that the message should not be stored, even when sent without +transient)
  2. prefixing: add a prefix like ! to tags that should not be stored. If a message contains only ! tags, it should not be stored a. Pro: client compliance is automatic. If a client is using the agreed-upon tag name, the server will correctly detect that the message should not be stored. b. Con: ugly (someone said "bleeeeehhh") c. Con: +typing has to be special-cased (i.e., +typing will be treated as though it were +!typing)
  3. +do-store: add a new tag, +do-store, that indicates that a TAGMSG should be stored. All other TAGMSG should not be stored. (The precise name of this tag is TBD, +do-store is a placeholder.) a. Pro: explicit b. Con: +draft/react has to be either amended (to recommend sending +do-store) or special-cased (i.e., implementations should store react messages even without the +do-store tag) c. Con: may be counterintuitive for client developers and/or discourage experimentation with new client-only tags
  4. Status quo a. Pro: explicit b. Con: either the server vendor or the network operator has to play whack-a-mole with new tags, or possibly go to a whitelisting approach, which discourages experimentation with new client-only tags

I personally have no objections to special-casing any existing client tags (typing or react), whether ratified or draft.

slingamn commented 4 years ago

I think option 3 ("+do-store", i.e., making storage of TAGMSG opt-in via a new tag) has the best chance of consensus here. Anyone have strong feelings about it?

Although previous discussion has described "+do-store" as a client-only tag, it seems like it should not actually be relayed to other clients. Since the intent is that the tag will be processed by the server, not by other clients, it seems appropriate to use a non-client-only tag (i.e., without a + prefix). How about draft/persist (and then eventually persist) as the tag name?

slingamn commented 4 years ago

I wrote option 3 up as https://github.com/ircv3/ircv3-specifications/pull/416 ; feedback there (or here) would be very welcome.

jwheare commented 4 years ago

Keeping feedback here for now, I'm kind of -0 on this whole thing. I posted my thoughts on IRC a few weeks ago but forgot to add them to this thread.

Ignoring stuff for log purposes has been an issue since the start, even before client tags were a thing. It's something we've had to deal with as a client/bouncer with all numerics/commands ever introduced.

We log unhandled lines for users so they can at least see some output even if we don't specifically deal with the line, and if we deal with the line, we can then decide whether or not to log it. No real fix for that other than just playing whack a mole with numerics as and when we see them appear.

I appreciate this is a bit easier for us to manage as a centralised service, and there's perhaps the potential for TAGMSG things like typing to get noisier, but I've also seen noisy unrecognised numerics show up before.

Re: client tag or not, bouncers that do logging and aren't integrated with the server will need it to be a client tag.

slingamn commented 4 years ago

Re: client tag or not, bouncers that do logging and aren't integrated with the server will need it to be a client tag.

Good point; I'm incorporating this into the draft.

I appreciate this is a bit easier for us to manage as a centralised service, and there's perhaps the potential for TAGMSG things like typing to get noisier, but I've also seen noisy unrecognised numerics show up before.

I acknowledge this argument but I guess I don't find it very persuasive? It seems important to have a durable mitigation even if it only covers one case (the case of TAGMSG). And with respect to servers, as opposed to bouncers, it seems to cover everything.

jwheare commented 4 years ago

That's fair, and I guess I'm just not persuaded enough that the benefits sufficiently outweigh the cons so far. I'll let others comment if they feel more strongly about this.

SadieCat commented 4 years ago

I agree with @jwheare on this.

DanielOaks commented 4 years ago

I reckon having a +draft/persist hint for tagmsgs would be a good thing, but yeah we'll likely keep a list of c2ctags to store on seeing as well. I like the idea of something like +ephemeral for bot replies but as an explicit user-visible please-do-not-log-my-messages feature we can do better I reckon (and there's been more proper suggestions for that in the past in the issue history iirc).

progval commented 4 years ago

An alternative to option 2: add a prefix like ! to tags that should be stored. It spares special-casing +typing.

I also find this option ugly (in addition to being unflexible), but automatic compliance is really compelling.

But any of the options so far would be an improvement over the status quo for me. Limnoria keeps some message history in a ring buffer (so it's on a best effort basis), and it would be an easy way to keep messages out of that buffer, but nothing awful happens if a sender is misusing do-store/transient.

slingamn commented 4 years ago

I discussed +persist with @DarthGandalf, who doesn't like that it doesn't solve the abuse problem: a malicious client can just spam @+bazbat;+persist TAGMSG #channel. (This is a more pressing abuse concern than regular PRIVMSG/NOTICE spam because it is more likely to fly under the radar of a channel moderator.)

The alternative, then, would be server-side whitelisting of persist-eligible client-only tags (like +react). I have two objections: first, that this requires active cooperation from server vendors / operators, and second, that there will likely still be opportunities to abuse these tags. (For example, a client can send every available reaction to every message. There will probably be similar opportunities to abuse future persist-eligible client-only tags.)

DarthGandalf commented 4 years ago

@slingamn One important details you missed from our conversation: I didn't yet think much about how to deal with client tags. So I can very well be wrong on this matter, as you pointed out. For the same reason I didn't really participate in the discussion above.

jwheare commented 3 years ago

I think this concept needs more explicit support from a wider range of implementations before it can be moved further along.

slingamn commented 3 years ago

It has moved a bit further along --- the current status of this is https://github.com/ircv3/ircv3-specifications/pull/416.

jwheare commented 3 years ago

Yeah, I'm including that PR in my appraisal of the situation here.

slingamn commented 3 years ago

I'm closing this thread in favor of https://github.com/ircv3/ircv3-specifications/pull/416.