Use updated stream/sender names/avatars on messages

gnprice commented 2 years ago

When we show a message in the message list, we show its sender's name and avatar based on the sender_full_name and avatar_url properties on the message object itself. (See src/webview/html/message.js.) If we show a recipient header with a stream name (because it's a stream message and we're in an interleaved narrow: all messages, @-mentions, starred messages, or search), the stream name comes from the message object too. (*) (See src/webview/html/header.js; the streamNameOfStreamMessage accessor just reads the display_recipient property.)

(*) This isn't true right now in main, but it was until recently, and will be again after #5207.

And if the person who sent the message changes their name or avatar, or if the stream gets renamed, we don't go and update all the message objects. As a result, we'll keep showing the message with the old name, avatar, or stream name.

This bug is somewhat parallel to #3408. We do look at most of these events (users changing avatar since 7296551876e91674330624212dd3a41bb8c30128 / #4230; streams changing name since basically forever, with a gap fixed in 152b06e25 / #2451; users changing name not yet), and apply them to our central user and stream/subscription data structures respectively; we just don't apply those updates to the messages, and we don't consult that updated central data for this purpose. Like #3408, this issue gets largely papered over by our frequently refreshing the server data from scratch. As explained in #4841, we'd like to stop refreshing so frequently because it's a conspicuous drag on the user experience; when we do, this issue will become more visible.

There are two basic approaches we can choose from to solve this problem:

Normalized data: Each piece of information lives in one place. This would mean that for information about a user or stream, we would always consult the central users or streams/subscription data structure, rather than information somewhere else like on a message.

A major advantage of this strategy is simplicity: it eliminates a class of bugs where one updates the data in one place and forgets to update it in another

A pitfall of this strategy is completeness: if a message refers to a user or stream, and we rely on the normalized data about that user or stream, we need them to be present in the central data structure about users or streams. It's possible to fetch a Zulip message that refers to a stream (perhaps also a user) that we've never previously heard of: this was the cause of #5206, a regression caused by 9d05bbaf6fc9937ac9a776e28cbc988cc3b47b45 where we tried to switch to normalized data for the stream name in a message's recipient header. So in order for this strategy to work, we'd have to do something like:
- On receiving a message (either from a get-messages request, or a message event), if it's in a stream we don't already know about, go update the streams data with what the message tells us about the stream (i.e., its name.)
In the web app, this is the strategy taken for another piece of data with a similar flavor: the names of the participants in a PM conversation, for display in recipient headers.
- These come from a property display_reply_to, which is computed by message_store.get_pm_full_names, which reads only sender IDs from the message and looks those up in the central users ("people") data.
- Conversely, on receiving a message, the web app runs through any PM recipients and stows their details into that central data, if unknown. (See people.extract_people_from_message and its sole caller message_helper.process_new_message.) All other references to a message's list of PM recipients (the display_recipient property of a PM) look only at the IDs.
Denormalized data: Some data about objects of type A also lives redundantly on objects of type B where they refer to objects A. This is our status quo for this data: we get some information about users and streams from looking at the messages that refer to them.

The tricky thing needed for getting correctness with this strategy is that whenever the data changes, we need to update all the copies. This would mean something like:
- On updating a user's name or avatar, go through all the messages to find messages they sent, and update the information there.
- On updating a stream's name, go through all the messages in that stream, and update the stream name there.
In the web app, this is the strategy taken for sender names and stream names.
- For the updates, see message_live_update.js and its callee message_store.update_property.

For sender avatars, the web app takes a hybrid strategy:

On displaying a message, it first looks up the normalized data. If there's none there (in particular if we can't find the sender in the user/people data), it falls back to the denormalized data. See people.small_avatar_url.
On updating a user's avatar, message_live_update.js gets called, but I believe the useful effect of this is just that it causes the message list to rerender. (It sets a property small_avatar_url on the message, but nothing seems to consult that. Looks like zulip/zulip@131a1dd54 introduced that quirk.)
On receiving a message… I'm not finding any code to look at its avatar_url. So if it gets a message from a sender it doesn't know about, I believe it'll always fall back to the denormalized data when showing that message.

(References to the web app are as of https://github.com/zulip/zulip/tree/410281624 .)

gnprice commented 2 years ago

One annoying thing about the normalized approach, given the fact that we'll sometimes see messages in streams (and probably from senders) that we don't know about, is that it'd mean having two types of stream/user records in our central stream/user data: full records for the ones we can see in the usual way, and skeleton records for the ones we know about only through messages.

The web app goes ahead and puts skeleton user objects, with only a handful of the usual properties, into the same place as normal user objects go:

        _add_user({
            email: person.email,
            user_id,
            full_name: person.full_name,
            is_admin: person.is_realm_admin || false,
            is_bot: person.is_bot || false,
        });

(That's from extract_people_from_message; the local person is either an element of a PM's display_recipient, or synthesized from the sender_* properties.)

As that excerpt already demonstrates, that approach leads rapidly down the path of making up bogus data. On receiving a stream message from an unknown sender, that bit of code goes and runs, and because messages don't carry any information about whether the sender is an admin or a bot, it just assumes (really, lies) that they're neither.

If doing that with a type-checker, as in our codebase, the type-checker would immediately complain that the types don't match, and rightly so. This object is missing a bunch of the properties expected on a user; and if OTOH one made the data structure's type accept such skeleton records, then the type-checker would flag every place we looked up a user and then tried to use one of the properties not present here.

I believe in the web app there currently isn't a live bug caused by this. That's true only because of an invariant which isn't expressed:

Every time we look up a message's PM recipients in the users data, the only properties we look at are the ones that would have been set here.
Or said the other way around: whenever we look up a user and then look at some property that would be missing on one of these skeleton records, it's because we found the user ID somewhere other than a message display_recipient, and consequently we should have a full user record (or if we don't, at least it's not this code's fault, and instead the responsibility of whatever feature did bring that user ID to the app's attention.)

So if we were to pursue this strategy, we'd want to express a similar invariant in a way the type-checker can see and check for us. One way that could look is:

The central data structure would look like Map<UserId, User | PmRecipientUser>, storing either a skeleton or full record for each user.
The normal way of looking up a user would be by an accessor like getUser: (UsersState, UserId) => User | void. If for the given user ID we had only a skeleton record, it'd return undefined/void, just as if we had no data about the user at all. Then if it does return something, the type-checker knows that the caller gets a normal full user record.
When looking up a user as a PM recipient, we'd instead use an accessor like getPmRecipientUser: (UsersState, UserId) => User | PmRecipientUser | void, which would return whatever record we had. (Or perhaps a slightly better return type would be like: { ...PmRecipientUser, ... } | void, with an inexact object type.) Effectively the caller would be promising that it didn't try to use any of the properties that aren't on skeleton records; and the type-checker would readily check that that was right.

chrisbobbe commented 2 years ago

It may be simpler to discuss after #5188 is settled, but: should the choice between normalized/denormalized be influenced by how we want MessageListElement objects to be put together, e.g., questions like https://github.com/zulip/zulip-mobile/pull/5188#discussion_r793016697 ? If we normalize aggressively, can we still construct each MessageListElement with all the data belonging to the element, such that a function like doElementsDifferInterestingly (in that draft PR) can have the nice property of just taking two MessageListElement objects and no "background data"? That might be good if so.

gnprice commented 2 years ago

Yeah, we certainly could do that.

It might come at some performance cost: we'd be adding a pile of extra properties to each of these MessageListElement objects. We construct potentially a lot of those, in a pretty performance-sensitive path, so making them each have like 8 properties instead of 4 could be meaningful additional work.

That points toward one advantage of the denormalized strategy: it could help us more easily make use of the usual React-style "===, therefore nothing changed" short-circuit optimization.

zulip / zulip-mobile

Use updated stream/sender names/avatars on messages #5208