Open andymandias opened 1 month ago
@andymandias I'll look to review this very soon :eyes:
Thanks @tarkah! It's definitely still a work in progress, in particular query buffers don't have chathistory
support yet. There are also a number of smaller issues to work out (will try to add these to the list at the top of the PR). I am currently using it with soju
(timestamp-based) and ircs://ergo.chat/#ergo (msgid-based) and it ~works.
Things I think particularly need reviewing:
load_history_now
and make_partial_now
(starting in main.rs
, going into manager.rs
). I believe that functionality should instead be implemented by a chain of asynchronous commands (load history → get latest message), but I wasn't quite sure how to set up the chain. I set it aside to deal with later, but I suspect you may have an even better solution.Full
history in order to have a clear reference message when adding messages to the history. I've been working on making all processes work with a Partial
history, but in the meantime I've messed with the make_partial
to allow from converting from a Full
-with-unread-messages to a Partial
history. Done my best to respect the existing structure, but it probably warrants review.Brief addendum to note it looks like my next push will involve a moderate restructuring, so probably best to review after (hopefully tonight).
I'll be chipping away at this still, but it's probably in a good spot for a WIP review now.
I'm a bit unsure about all the API changes to history
to support this feature. I need to internalize it all more, but it seems to me we shouldn't have so many new code paths. All we're doing is feeding messages to history
like we would any other message that comes in, the only difference is we may want to splice it into place vs appending it.
So it seems to me the only API change of history
needed would be the following:
It'd be great to sync up some time to discuss this over IRC so I can better understand the model at a high level as there's a lot to internalize. I'm also seeing async functions getting called in from main
but w/out awaiting which is a no-op
.
@andymandias how does this feature interact w/ bouncers that send a replay buffer on connect? Are the bouncers programmed to only send one or the other, or will both get sent? If both are sent, do we just rely on our dedupe strategy to eliminate dupes, or do we have some other mechanism to handle this?
@tarkah the ircv3 description of the specification says that the replay buffer should not be sent automatically when a client has negotiated chathistory
. In my experience with soju that is the case (ZNC, which I've switched off of, does not support chathistory
to my knowledge). I think we should be prepared to potentially get dupe messages around join time (even if they follow ircv3 specification, I think we could end up with some dupes if - for example - a message is sent to the channel right after we join but before the server receives our associated chathistory
request). But, my feeling at the moment is that our dedupe strategy is sufficient for that purpose.
Another restructuring, to better allow for using LATEST
then repeated BETWEEN
s to update when joining a channel. The old scheme using repeated AFTER
would fail to receive any messages when the reference message was no longer in the server's available message history. The restructuring should also make it easier to utilize the TARGETS
subcommand, which I plan to work on next.
Should be a bit clearer in intent and use than before, but it doesn't change any of the problem areas of the PR (History
and asynchronicity).
Still testing, but at the moment this is what I consider feature complete. There are a couple additions that probably warrant explanation.
read_marker
to message histories to store the timestamp of the last read message in a channel/query. These operate very similar to opened_at
, but the intention is to have a RFC 3339 timestamp that can be saved and loaded separate of the messages in a history. Then, when messages arrive we can know whether they trigger unread state without loading the message history and looking for a duplicate. I'm not aiming to implement draft/read-marker
here, but the intention is to be usable for that feature. I have been a bit lazy in reading these synchronously using std::fs
; I'm hoping they are small enough reads that that's acceptable. (As a side benefit, read_marker
allows unread state to persist across application close/open.)read_marker
separate from the message history), I took this as an opportunity to tweak the message storage scheme. I've only done this because I recalled discussion about making the stored messages a bit easier for users to access; if it's out of scope for this I'm happy to revert it. (I would probably use a hash for the read_marker
filename in that case.)@andymandias @casperstorm Let's find some time to discuss the scope and desired UX of this feature. The PR is now ~2k LOC and makes a lot of API changes and I don't want to dive in and start making suggestions or changes until we are all aligned on scope & UX.
@andymandias @casperstorm Let's find some time to discuss the scope and desired UX of this feature. The PR is now ~2k LOC and makes a lot of API changes and I don't want to dive in and start making suggestions or changes until we are all aligned on scope & UX.
We almost needed a RFC for this PR 😅 Perhaps, @andymandias, you could do a small writeup of some of this PR including some decisions you have made along the way. I would love to read something like that before digging into this monster.
@casperstorm @tarkah I may not be able to commit much code to this PR for a bit, but I will do my best to explain the intended features along with an overview of the significant implementation decisions made in service of those features. I'm going to try and keep it relatively high level to avoid getting lost in the weeds, but will be available to answer any follow-up questions you may have.
As the PR currently stands, the main features are to use chathistory
to do the following:
TARGETS
request is made when chathistory
support is acknowledged by the server. TARGETS
is used to discover any queries that were made while the client was not connected, and then chathistory
requests for the latest messages in those queries are made. chathistory
for the latest messages in that channel. chathistory
requests). If there are no messages in the history, then a single LATEST
chathistory
request is made. The maximum number of messages of a request is set by the server, but I also set a client-side maximum at 500 messages (not attached to that particular number). TARGETS
uses a targets_marker
to set its request boundaries, more on that later.chathistory
request for messages in the channel/query before the earliest message that exists in its client-side history.I'm not opposed to adding further features, but I wanted to keep the scope of this PR as minimal as possible.
There are two major changes to History
to support these features:
server_time
, and new messages are inserted accordingly. Sorting is primarily done to enable deduplication, since the possibility of duplicate messages are an expected part of messages requested via chathistory
. New messages are checked to be duplicates against messages with similar server_time
, based on the message's id
(if available) or the message's server_time
and contents (otherwise, excepting some special cases). @tarkah deserves essentially all of the credit for this, and none of the blame for any aspect that may be broken.History
now has a read_marker
instead of an opened_at
. read_marker
is essentially an opened_at
that is written to the filesystem, and it serves a similar purpose. So, the backlog
marker is placed via read_marker
in a history in nearly the exactly manner as it was done via opened_at
. But a read_marker
allows for the determination of whether a message is "new" at the time it is inserted into a History::Partial
. Since a History::Partial
cannot be expected to have messages available for deduplication, duplication detection cannot be used as a proxy for message newness. (I don't think we want to load message history every time a new message is received, in order to check message newness.) Instead, the read_marker
is checked against any arriving message's server_time
in order to only trigger_unread
when the server_time
is newer than the read_marker
. The read_marker
is updated the exact same manner that opened_at
was, except that it is also saved to disk. (This has the side benefit of persisting that information across program close/open.)
read_marker
is written. I didn't want to store with the message history, since their main purpose is to avoid loading that history and I wasn't sure how to partially load a compressed history (or partially compress a history file). I took this as an opportunity to make the stored history naming less opaque (i.e. to name history files based on server/buffer.json.gz
rather than a hash). Renaming the histories does have one function use; we can expect histories with the old naming schemes will not have been written sorted (so we should sort them on load), but histories with the new naming scheme will presumably be stored sorted (and won't need to be sorted on load). We could just sort everything though, and continue with hashes for all of the files.Going back briefly to the first main feature: I mentioned TARGETS
uses a targets_marker
to build its request. That is, it requests queries/channels that have a new message since the time specified in the targets_marker
. The targets_marker
is the same data as a read_marker
, but it is updated whenever:
TARGETS
response is received in full to the server_time
of the last message in the batchread_marker
is updated on disk to the same value as the read_marker
.The goal here is to be fairly conservative. Don't update the targets_marker
too readily, in order to reduce the likelihood of missing a message. But also, don't be extremely conservative, otherwise we'll end up requesting messages for old queries (which results in reopening them in the client, even though nothing new has been sent). If no targets_marker
exists, then the start of Unix epoch is used.
That's everything that comes mind at the moment, so I think it might be best to stop here and field questions.
Work in progress PR to add IRCv3 CHATHISTORY support (i.e. #206). Since it's not a widely available feature yet, the goal is to minimize the any effects on Halloy's operation when CHATHISTORY is not available. Testing on ircs://irc.ergo.chat:6697/ to start.
Currently (more-or-less) implemented:
Basic message deduplication.
Planned:
event-playback
to allow replay of channel events (JOIN
/PART
/QUIT
/etc).HistServ
appears to be non-standard, so all its messages are filtered out for now for parity of experience.ConvertHistServ
PRIVMSG messages into the appropriateJOIN
/PART
/QUIT
/etc messages.JOIN
.JOIN
messages request(s)).LATEST
functionality (artificially limit the size of batches).indexmap
for message history?). And/or, avoid the need for deduplication.TARGETS
to open Query buffers for direct messages sent while disconnected.chathistory
request timeout.