Improve proactive data-fetching behavior for slow connections

setharnold commented 1 year ago

Hello, I'd like to suggest that matterhorn should optimistically start loading channel conversations, names, reactions, etc shortly after connecting to the server.

My problem: every week or two I restart matterhorn. This means every time I load a channel for the first time, I must wait three to twelve seconds for the contents to load before I can read the channel. I'm in roughly seventy channels, that comes to multiple minutes of waiting for contents to download.

Please consider downloading all this data in the background before I try to view it.

Thanks

jtdaugherty commented 1 year ago

Hi @setharnold - it sounds like you are not having a great experience, possibly due to a slow connection or server.

I'm in roughly seventy channels, that comes to multiple minutes of waiting for contents to download.

Are you going through them one by one, in order to load them all?

jtdaugherty commented 1 year ago

@setharnold I wanted to also follow up with a couple things I need to mention.

First, doing an optimistic load under your network conditions might not actually help matters very much, because if your network is slow enough that it's taking minutes to make all the fetches, then it's likely that you'll go view a channel only to wait even more time since that particular channel's fetches could be somewhere in the back of the queue. In the current implementation, Matterhorn only loads what it needs to show you the channel that you actively asked to see.

Second, Matterhorn already does the loading you're asking for, but it does it only for the currently-viewed channel. In addition to doing this because it is polite to do so at the API level, it does this under the assumption that that's the channel you're most interested in having the latest messages for. But Matterhorn still fetches the metadata for all channels, so the sidebar still shows the right unread and mention counts even if you haven't switched to those channels yet.

Third, doing what you ask requires a bigger change than I have time to implement, and I'm the only Matterhorn developer at the moment. I'm also not sure I want to implement it anyway, given that the consequences for usability are not necessarily entirely positive. I want to acknowledge that in your situation, such a behavior could potentially give better results, but I'm not sure it's going to give better results in general so I'd only implement it if I had time and if I was convinced that it was possible to do efficiently. With that said, if someone wanted to look at the API docs for Mattermost, or even the official client, to see if there's a better way to fetch the initial content than we're currrently doing, that would be helpful.

setharnold commented 1 year ago

Thanks for the thoughtful response (as always).

I did worry about this background loading perhaps making things worse for users with poor connectivity to their server but figured there would be a clever way to solve it.

Historically, you've had much better implementation ideas than me, so I've been trying to learn to stick to only a statement of the problem, with mixed success. Hehe.

Here's my general workflow:

matterhorn is started.
the last channel I was in (my 'overnight resting channel') is demand-loaded.
matterhorn collects the list of channels I'm in, and then the list of channels with new contents is updated.
I switch to my team's private channel.
matterhorn demand-loads the most recent contents
I scroll back to the last thing I read. This might require multiple message loads.
I spend several minutes reading scrollback.
I hit alt+a to go to the next channel.
matterhorn demand-loads that contents.
I read the last few lines and decide if I read further or not. Go back to alt+a step.

Thus, I really only read channels that have had changes. I very rarely switch to a channel that hasn't had changes, in order to find information from the past or write new content.

Each of these demand-load operations can take a few seconds or half a minute.

My implementation idea is two queues: one for demand-loaded contents (the user has switched to that channel and is waiting for the content to load) and one for readahead-loaded contents (my desired feature).

We poll the demand queue before we poll the readahead queue. This way, user channel switches are only ever blocked by one channel load. (Or, if we use two network connections, we might be blocked waiting for whichever one finishes loading first.)

If the channels with changes are added to the front of the readahead queue, before channels without changes, then they will have an increased chance of being loaded while I'm still reading the first channel's scrollback, before I try to switch to them.

Of course, I completely understand the scarcity of time. If this just isn't something you can get to, I get it.

Thanks

jtdaugherty commented 1 year ago

My implementation idea is two queues: one for demand-loaded contents (the user has switched to that channel and is waiting for the content to load) and one for readahead-loaded contents (my desired feature).

We poll the demand queue before we poll the readahead queue. This way, user channel switches are only ever blocked by one channel load. (Or, if we use two network connections, we might be blocked waiting for whichever one finishes loading first.)

Incidentally, Matterhorn effectively already does this internally with one queue, where we have a way to allow some queue items to preempt others, being added to the front of the queue rather than the back. I'm happy to talk about implementation issues, but for the purposes of tickets, I find it's best if we can focus on the user-facing problems instead and figure out what is usable and feasible, and then leave the implementation issues to me. :)

But yes, as you acknowledged, time scarcity is one concern for me. API etiquette is another, so piling those requests up in our internal queue will run the risk of a rate limit violation for users with faster connections. It won't in your case, obviously. So that's why I'm left thinking that if this is going to be changed in Matterhorn, the best way to do it is to see if there is a better way to make a bulk fetch from the server to avoid rate limit issues.

As an aside, the suggestion to prefetch other data such as names and reactions might not be feasible, at least in bulk. Perhaps the server developers have added API endpoints to help with that, or perhaps we aren't looking at our API responses closely enough, but the way Matterhorn works right now is that when it gets a batch of messages (such as for a channel update), it scans through them all and gathers up the set of unrecognized usernames and mentions and then attempts to bulk-fetch those to whatever extent it can. Unless the server API provides an endpoint for "get me all the users mentioned in this channel" or "get me all the reactions for posts in channel X" somehow, I'm not sure how we can do better. But if I were to look into this, the first thing I would do is clear my browser cache and then go load the official web client and see what network requests it makes. That's been the most helpful way to understand how best to use the server API.

setharnold commented 1 year ago

Thanks for thinking through possible solutions here.

I'm just accustomed to an IRC client 8ms away, talking to IRC servers 10-25ms away.

Our Mattermost server is on the other side of the ocean, and I'm pretty sure we're pushing it way further than it was meant to go. It's just so sluggish and frustrating. I was hoping there would be some easy ways to hide all this latency from me.

But it sounds like you've already tried pretty hard to fit within the limits you've got.

Thanks again, it's always a pleasure.

jtdaugherty commented 1 year ago

I was just looking at the Mattermost API documentation and noticed that many API endpoints (such as the one to get a single post) return the post metadata -- attachments, reactions, etc. -- that we currently get only by making additional fetches. I don't know when they added that, and it's possible that it was there long ago but we just didn't notice it, but at any rate Matterhorn does not know about that included metadata at all. Looking at some of the API endpoints that we use, if that metadata is prevalent, then it would be possible to drastically reduce the number of API requests that Matterhorn makes by parsing the additional data that we may already be getting. I'll find out which server version added it to understand how well a change to use this would be backwards-compatible.

jtdaugherty commented 1 year ago

@setharnold I'm happy to announce that as of c30308756dee95631c2e556d0ff68410c6ddcbcd Matterhorn makes significantly fewer API requests by leveraging existing API response data that we just didn't know we were getting already. In one test, Matterhorn went from making 105 requests on startup to just 19! So despite my initial hesitation to dig into this, and even though I don't have a lot of time to jump on to Matterhorn issues when they arise, I'm really happy with this result and I'm glad you created this ticket since it got me to look into this more deeply.

If you want to try it out, you can build develop, but if you'd rather wait for a release, I'll try to get that out in the next week or so.

setharnold commented 1 year ago

That's an awesome improvement! Very cool. I think the release would be easier all around. Thanks!

jtdaugherty commented 1 year ago

This is now released in 50200.19.0:

https://github.com/matterhorn-chat/matterhorn/releases/tag/50200.19.0

setharnold commented 1 year ago

Thanks for working on this! Channel loads are significantly faster now, they're all roughly two seconds. (It's between five and fifteen times faster now than when I opened this report.)

Thanks

jtdaugherty commented 1 year ago

I'm so glad to hear that things improved a lot in your situation!

matterhorn-chat / matterhorn

Improve proactive data-fetching behavior for slow connections #804