Open gnprice opened 2 years ago
One annoying thing about the normalized approach, given the fact that we'll sometimes see messages in streams (and probably from senders) that we don't know about, is that it'd mean having two types of stream/user records in our central stream/user data: full records for the ones we can see in the usual way, and skeleton records for the ones we know about only through messages.
The web app goes ahead and puts skeleton user objects, with only a handful of the usual properties, into the same place as normal user objects go:
_add_user({
email: person.email,
user_id,
full_name: person.full_name,
is_admin: person.is_realm_admin || false,
is_bot: person.is_bot || false,
});
(That's from extract_people_from_message
; the local person
is either an element of a PM's display_recipient
, or synthesized from the sender_*
properties.)
As that excerpt already demonstrates, that approach leads rapidly down the path of making up bogus data. On receiving a stream message from an unknown sender, that bit of code goes and runs, and because messages don't carry any information about whether the sender is an admin or a bot, it just assumes (really, lies) that they're neither.
If doing that with a type-checker, as in our codebase, the type-checker would immediately complain that the types don't match, and rightly so. This object is missing a bunch of the properties expected on a user; and if OTOH one made the data structure's type accept such skeleton records, then the type-checker would flag every place we looked up a user and then tried to use one of the properties not present here.
I believe in the web app there currently isn't a live bug caused by this. That's true only because of an invariant which isn't expressed:
display_recipient
, and consequently we should have a full user record (or if we don't, at least it's not this code's fault, and instead the responsibility of whatever feature did bring that user ID to the app's attention.)So if we were to pursue this strategy, we'd want to express a similar invariant in a way the type-checker can see and check for us. One way that could look is:
Map<UserId, User | PmRecipientUser>
, storing either a skeleton or full record for each user.getUser: (UsersState, UserId) => User | void
. If for the given user ID we had only a skeleton record, it'd return undefined/void, just as if we had no data about the user at all. Then if it does return something, the type-checker knows that the caller gets a normal full user record.getPmRecipientUser: (UsersState, UserId) => User | PmRecipientUser | void
, which would return whatever record we had. (Or perhaps a slightly better return type would be like: { ...PmRecipientUser, ... } | void
, with an inexact object type.) Effectively the caller would be promising that it didn't try to use any of the properties that aren't on skeleton records; and the type-checker would readily check that that was right.It may be simpler to discuss after #5188 is settled, but: should the choice between normalized/denormalized be influenced by how we want MessageListElement
objects to be put together, e.g., questions like https://github.com/zulip/zulip-mobile/pull/5188#discussion_r793016697 ? If we normalize aggressively, can we still construct each MessageListElement
with all the data belonging to the element, such that a function like doElementsDifferInterestingly
(in that draft PR) can have the nice property of just taking two MessageListElement
objects and no "background data"? That might be good if so.
Yeah, we certainly could do that.
It might come at some performance cost: we'd be adding a pile of extra properties to each of these MessageListElement
objects. We construct potentially a lot of those, in a pretty performance-sensitive path, so making them each have like 8 properties instead of 4 could be meaningful additional work.
That points toward one advantage of the denormalized strategy: it could help us more easily make use of the usual React-style "===
, therefore nothing changed" short-circuit optimization.
When we show a message in the message list, we show its sender's name and avatar based on the
sender_full_name
andavatar_url
properties on the message object itself. (See src/webview/html/message.js.) If we show a recipient header with a stream name (because it's a stream message and we're in an interleaved narrow: all messages, @-mentions, starred messages, or search), the stream name comes from the message object too. (*) (See src/webview/html/header.js; thestreamNameOfStreamMessage
accessor just reads thedisplay_recipient
property.)(*) This isn't true right now in main, but it was until recently, and will be again after #5207.
And if the person who sent the message changes their name or avatar, or if the stream gets renamed, we don't go and update all the message objects. As a result, we'll keep showing the message with the old name, avatar, or stream name.
This bug is somewhat parallel to #3408. We do look at most of these events (users changing avatar since 7296551876e91674330624212dd3a41bb8c30128 / #4230; streams changing name since basically forever, with a gap fixed in 152b06e25 / #2451; users changing name not yet), and apply them to our central user and stream/subscription data structures respectively; we just don't apply those updates to the messages, and we don't consult that updated central data for this purpose. Like #3408, this issue gets largely papered over by our frequently refreshing the server data from scratch. As explained in #4841, we'd like to stop refreshing so frequently because it's a conspicuous drag on the user experience; when we do, this issue will become more visible.
There are two basic approaches we can choose from to solve this problem:
Normalized data: Each piece of information lives in one place. This would mean that for information about a user or stream, we would always consult the central users or streams/subscription data structure, rather than information somewhere else like on a message.
A major advantage of this strategy is simplicity: it eliminates a class of bugs where one updates the data in one place and forgets to update it in another
A pitfall of this strategy is completeness: if a message refers to a user or stream, and we rely on the normalized data about that user or stream, we need them to be present in the central data structure about users or streams. It's possible to fetch a Zulip message that refers to a stream (perhaps also a user) that we've never previously heard of: this was the cause of #5206, a regression caused by 9d05bbaf6fc9937ac9a776e28cbc988cc3b47b45 where we tried to switch to normalized data for the stream name in a message's recipient header. So in order for this strategy to work, we'd have to do something like:
message
event), if it's in a stream we don't already know about, go update the streams data with what the message tells us about the stream (i.e., its name.)In the web app, this is the strategy taken for another piece of data with a similar flavor: the names of the participants in a PM conversation, for display in recipient headers.
display_reply_to
, which is computed bymessage_store.get_pm_full_names
, which reads only sender IDs from the message and looks those up in the central users ("people") data.people.extract_people_from_message
and its sole callermessage_helper.process_new_message
.) All other references to a message's list of PM recipients (thedisplay_recipient
property of a PM) look only at the IDs.Denormalized data: Some data about objects of type A also lives redundantly on objects of type B where they refer to objects A. This is our status quo for this data: we get some information about users and streams from looking at the messages that refer to them.
The tricky thing needed for getting correctness with this strategy is that whenever the data changes, we need to update all the copies. This would mean something like:
In the web app, this is the strategy taken for sender names and stream names.
message_store.update_property
.For sender avatars, the web app takes a hybrid strategy:
people.small_avatar_url
.small_avatar_url
on the message, but nothing seems to consult that. Looks like zulip/zulip@131a1dd54 introduced that quirk.)avatar_url
. So if it gets a message from a sender it doesn't know about, I believe it'll always fall back to the denormalized data when showing that message.(References to the web app are as of https://github.com/zulip/zulip/tree/410281624 .)