'since' meaning in filters

mikedilger commented 1 year ago

I don't want a list of events with created_at after my 'since' filter. What I really want is all the events that the relay has become aware of since I last got events off of that relay. But we can only ask for one or the other (and I'm not clear which one relays are using).

Event 'created_at' stamps are created by the author at time A. Relays may become aware of these events quite a bit later on, at time C > A. If I last fetched events from the relay at time B < C, and I come back and ask for events since time B, will I get the event created at time A < B? My assumption is that I will not, and the only way to pick these up is to just have a huge overlap time in my request and hope I don't miss much.

If I'm correct and 'since' refers to 'created_at', I would offer to write a NIP that extends filters with another field. Relays would have to record when they became aware of the event (seems pretty easy, a timestamp default now field in a database).

eskema commented 1 year ago

I have a really hard time understanding this issue. since and until refer to the created_at, so if you request since X, all the events created_at > x will be returned

ok, I think I understand what you're referring

fiatjaf commented 1 year ago

How will the relay know you are you?

I think maybe we could use the same subscription ID and somehow the relay would just "continue" the subscription from where it stopped?

Doesn't this complicate the job of relays too much to be worth doing? I don't know.

eskema commented 1 year ago

this work should be done by the client, not the relay. the client is the one with the burden to reconstruct it's prefered timeline, the relay should not care. you need to query multiple relays and expect the info to be scatterred. If you depend on your info to be correct everytime, then use your own relay and make sure it has every post like you want it to. P.S. I also do think relays should have a received_at value, because we can leverage that and this would be solved without relays having to know who you are and what do you already have requested.. (and they probably already log that along with the ip)

fiatjaf commented 1 year ago

Is it really that bad to lose some backdated events? If that backdated event matters it will receive replies, be referenced or quoted by some other event, then the client should try to fetch it by ID and then everything should be sorted out.

unclebob commented 1 year ago

The solution I use in more-speech is to find the latest created_at date in the messages that I’ve saved locally. Then I ask the relays for all messages created after that date. I presume the relays are using the created_at field in the messages for that comparison.

On Nov 25, 2022, at 22:24 , Michael Dilger @.***> wrote:

I don't want a list of events with created_at after my 'since' filter. What I really want is all the events that the relay has become aware of since I last got events off of that relay. But we can only ask for one or the other (and I'm not clear which one relays are using).

Event 'created_at' stamps are created by the author at time A. Relays may become aware of these events quite a bit later on, at time C > A. If I last fetched events from the relay at time B < C, and I come back and ask for events since time B, will I get the event created at time A < B? My assumption is that I will not, and the only way to pick these up is to just have a huge overlap time in my request and hope I don't miss much.

If I'm correct and 'since' refers to 'created_at', I would offer to write a NIP that extends filters with another field. Relays would have to record when they became aware of the event (seems pretty easy, a timestamp default now field in a database).

— Reply to this email directly, view it on GitHub https://github.com/nostr-protocol/nips/issues/78, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJAJKLJTHL7R3Q7YOTZY3WKGGHLANCNFSM6AAAAAASLZGBZQ. You are receiving this because you are subscribed to this thread.

Robert C. Martin (Uncle Bob) | @.*** Uncle Bob Consulting LLC. | @unclebobmartin 847.922.0563 | cleancoder.com

eskema commented 1 year ago

The solution I use in more-speech is to find the latest created_at date in the messages that I’ve saved locally. Then I ask the relays for all messages created after that date. I presume the relays are using the created_at field in the messages for that comparison. On Nov 25, 2022, at 22:24 , Michael Dilger @.> wrote: I don't want a list of events with created_at after my 'since' filter. What I really want is all the events that the relay has become aware of since I last got events off of that relay. But we can only ask for one or the other (and I'm not clear which one relays are using). Event 'created_at' stamps are created by the author at time A. Relays may become aware of these events quite a bit later on, at time C > A. If I last fetched events from the relay at time B < C, and I come back and ask for events since time B, will I get the event created at time A < B? My assumption is that I will not, and the only way to pick these up is to just have a huge overlap time in my request and hope I don't miss much. If I'm correct and 'since' refers to 'created_at', I would offer to write a NIP that extends filters with another field. Relays would have to record when they became aware of the event (seems pretty easy, a timestamp default now field in a database). — Reply to this email directly, view it on GitHub <#78>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJAJKLJTHL7R3Q7YOTZY3WKGGHLANCNFSM6AAAAAASLZGBZQ. You are receiving this because you are subscribed to this thread. Robert C. Martin (Uncle Bob) | @. Uncle Bob Consulting LLC. | @unclebobmartin 847.922.0563 | cleancoder.com

The issue raised here is not very typical and only happens on certain conditions.. Alice subscribes to Bob's posts from some relay (or multiple, also applies). The client stores the last created_at it received so it can use that next time it needs to query. meanwhile, there was an event from Bob older than this last created_at stored value that for some reason, the relays didn't have before, but they do now. When Alice checks back on her feed, that new event will not come if she uses the since value because the event is older than that, thus, making the timeline incomplete.

If we had a received_at value to query from relays, then we could query every event received since our last saved date, and we would also receive backdated events that came after.

mikedilger commented 1 year ago

To be clear, the relays do not need to know who you are. All I'm asking is that the relays use THEIR time clock, rather then the event signer's clock. Each message on a relay would know when it was created by the signer, according to the signer, but also when the relay first became aware of it. Then I want to ask "give me the messages you became aware of after time X".

Like I say, this is really easy if relays store events in databases. Just add a "timestamp default now" field that automatically sets when the relay created the record, and filter on that field.

monlovesmango commented 1 year ago

are you proposing to add new filter fields for the relay timestamp or changing the current since and until fields to only query the relay timestamp?

eskema commented 1 year ago

this should be a new filter field, this should only extend, not replace existing behaviour

mikedilger commented 1 year ago

are you proposing to add new filter fields for the relay timestamp or changing the current since and until fields to only query the relay timestamp?

New field, maybe called 'received_at'

eskema commented 1 year ago

great, now someone needs to NIP it so some relay / client can implement. would love to hear what relay maintainers think. @Cameri @scsibug

cameri commented 1 year ago

Both nostr-rs-relay and nostr-ts-relay store the timestamp when an event is seen for the first time so implementing this would be trivial. I'd argue you'll eventually want to fetch ranges so we should implement the subscription filter as such. I propose seen_since and seen_until where seen_until is exclusive.

eskema commented 1 year ago

do we want this only for retrieving, so we don't actually need to know the value, or, do we also want relays to return this value with every event?

mikedilger commented 1 year ago

I only care to have it in the filter. If relays returned it with events, it would have to be outside of the event, augmenting this: ["EVENT", <subscription_id>, <event JSON as defined above>] with another field. But I vote to not do that.

@Cameri what do you mean by "seen_until is exclusive" I have no problem with adding seen_until. I won't be using it. I always want events right up until now.

cameri commented 1 year ago

I only care to have it in the filter. If relays returned it with events, it would have to be outside of the event, augmenting this: ["EVENT", <subscription_id>, <event JSON as defined above>] with another field. But I vote to not do that.

@Cameri what do you mean by "seen_until is exclusive" I have no problem with adding seen_until. I won't be using it. I always want events right up until now.

I mean like an open-closed interval [a, b), aka seen_since <= first_seen < seen_until so seen_until is not included in the result. If you don't pass seen_until you'll get events up until now.

scsibug commented 1 year ago

I think this feature exposes too much of the underlying relay state and complicates the subscriptions for clients (exposing two different timestamps). I'd much rather stick with a simpler model for querying events, even if we don't have a perfect way of knowing when a particular relay has received old events.

If one wants to ensure any old events get picked up, I'd just re-query without since/until filters, and get everything again. That is my preference at least.

It's pretty easy to implement for nostr-rs-relay though, pretty much just need another index and add an SQL clause, and some extra logic for the non-stored events.

mikedilger commented 1 year ago

I disagree with @scsibug but there is another way to solve this. Events could include an optional reference to the previous event by that author. I don't like that solution, but it would allow my client to know it was all caught up. Of course the status quo isn't too terrible either as events are not likely to be missed, or they aren't terribly important if they do get missed.

scsibug commented 1 year ago

I only have a mild dislike, mostly because the simplicity of nostr has a huge appeal for me. That said, we can implement this with no changes for existing clients, so that is very nice. I will work on implementing if/when the NIP is numbered and merged.

ok300 commented 1 year ago

I think seen_since and seen_until assume the relay is a sort of "append-only log".

AFAIK that's not the case, there are:

replaceable events: https://github.com/nostr-protocol/nips/blob/master/16.md#replaceable-events
expirable events (if #87 is merged) : https://github.com/0xtlt/nips/blob/master/40.md
kind 0 events, where a kind 0 with a newer timestamp (may? will?) imply deletion of earlier kind 0 of the same pubkey
deletion events: https://github.com/nostr-protocol/nips/blob/master/09.md

If a client queries the relay in a seen_since / seen_until paradigm, it could lead to inconsistencies, because the same query with the same seen_since can have or not have certain events, depending when it's called.

It's not really seen_[since|until], it's more like seen_[since|until]_and_not_yet_deleted_or_replaced_by_newer.

I suppose that's fine for a client that only filters by seen_since incrementally, but the moment seen_until is used, and it's a value in the past, the query becomes unreliable.

One way to go around this is to only add seen_since and always imply seen_until is now. That way, even if a newer event caused deletion or replacement of an older one, at least it's included in the response and the client can reconstruct what happened.

Another side-effect is it adds more responsibility to the relay.

Before, a misconfigured NTP setting or a temporary outage only meant some relay log timestamps are out of order.

With this new filter, it can lead to certain clients missing certain events. It's not just a relay anymore, it's a "relay with an accurate and always well-synchronized local time". An NTP issue on a relay can lead to clients missing events, and it would be very hard to even notice there's an issue.

It looks deceptively simple and straightforward, but IMO complicates the lives of both clients and relays.

Maybe seen_since and seen_until, instead of timestamps, could be relay-specific incremental UUIDs? Would achieve the same thing, but get rid of the risks above.

mikedilger commented 1 year ago

As I understand it, replaceable/deletable events are replaced/deleted by other events. If replacement or deletion events have shown up at the relay after I last checked, I want those.

If a relay interprets a replacement event as replacing an event in the past, and uses that past date of the original event in filters, but supplies the replaced event instead of the original event, then yes my logic won't get all events I have missed. I thought relays were dumb and clients were doing the interpretation.

I guess I really just don't understand what people are doing in regard to this issue. I've deleted the pull request because the majority of responses to it were negative. But I still don't know what my client is supposed to do -- ask for every event from the relay every time it is started? Or every event within the last N days (where users set N to whatever they like), even if they just closed and reopened the client?

ok300 commented 1 year ago

I've also been looking for an efficient way to stay in sync with a relay.

Solutions I'm aware of:

always query all events
always query all events with a relatively recent since (e.g. 1d, 1w, etc) and only go further back in time on-demand (client clicks "load more")
on first start query all events (until EOSE), then store timestamp, then query events with since = timestamp - 1h (or some amount of time to account for client time variance)
same as above, but instead of timestamp use the highest created_at of received events
- assumes relay uses NIP-22 to reject events with an unreasonable created_at

Also curious to hear how other apps handle this (cc @jb55 @unclebob @monlovesmango @vishalxl @KoalaSat @brilliancebitcoin @balas @Giszmo)

As to seen_since, I see two variations that look promising:

your original idea above (seen_since filter that uses relay local time when events were recorded), but without seen_until
- would support calling it with incremental seen_since
- client gets all event deletions, replacements, etc and has a consistent view
- no complications due to seen_until in the past and events getting replaced or deleted after
seen_since / seen_until, but where they point to incremental event UUID instead of timestamp
- relay NTP issues no longer have an effect on query results

What do you think?

mikedilger commented 1 year ago

Gossip code currently loads events back to either the last timestamp plus a small overlap period, and at most by one 'chunk size' which is configurable, so very much as you describe.

That last point with an incremental UUID is interesting. I know some feedback was that seen_since complicates things. I would think adding an incremental UUID also would get the same feedback. But if people liked it, this suits.

KoalaSat commented 1 year ago

@ok300 it depends of the scenario, but when loading the main timeline, nostros applies the 3rd option, it checks the last created_at, as side note, careful mixing this with limits, you might loose some events if the gap from your last createt_at is bigger

monlovesmango commented 1 year ago

on first start query all events (until EOSE), then store timestamp, then query events with since = timestamp - 1h (or some amount of time to account for client time variance)

I use this approximately. nostr-tools npm package doesn't yet have EOSE so I just store the current timestamp 3 min after queries have opened for a user (I am making an assumption that after 3 min everything is synced). once nostr-tools does have EOSE I will switch to using that instead. I also go back 7 days from sync timestamp. seems like overkill but would rather not miss things that took a couple days to propagate.

this isn't related to query completeness but once nostr-tools has EOSE I will use a different strategy for feed. I will query in intervals, starting with the latest interval and with each EOSE I will progress to the next interval until I hit the earliest interval (determined by last sync time). will likely write something for nostr-tools relay pool to handle this kind of stepped query. seen_since and seen_until would probably be more apt for this kind of query than since and until

weex commented 1 year ago

Is this issue satisfied by either of the two merged PRs above?

nostr-protocol / nips

'since' meaning in filters #78