waku-org / research

Waku Protocol Research
MIT License
3 stars 0 forks source link

Adding `messageHash` atribute in DB, focusing synchronization #46

Open ABresting opened 8 months ago

ABresting commented 8 months ago

Waku is the family of decentralized protocols. The Waku network consists of independent nodes running the corresponding protocols.

Waku store/archive uses id attribute as the primary key (in conjunction to storedAt and pubsubTopic), indexing and cursor related functions. A deterministic way of addressing a message is referred to in RFC. We require this deterministic addressing in synchronization of messages across the nodes operating in the decentralized space. For legacy reasons we aim not to break the existing id attribute, in parallel there should be a messageHash attribute driving from the above-mentioned RFC. There seems to be a wider need for clarity on the nitty gritty details of such a proposal/change.

After multiple deliberations with Hanno and Ivan, following is the direction/approach we can take:

Introduce a messageHash attribute in DB (SQLite and Postgres) while keeping id attribute to support the legacy application, in future id will be deprecated, and messageHash will take over. Breakdown of notable action items below:

This also aims to reverse the already made change to computeDigest calculation in PR, whose revert is already in progress on PR. This will spill over to goWaku who have already made this change so they also need to reverse this. And will abandon/cancel the ongoing issue and PR in jsWaku.

Seeking input from all the interested parties: @jm-clius @Ivansete-status @alrevuelta @vpavlin @fryorcraken @richard-ramos @weboko @waku-org/waku

jm-clius commented 8 months ago

This seems reasonable. Since the RFC uses the term message_hash I think it's fine to continue renaming the waku core module from digest to computeMessageHash.

where messageHash is used as a part of put driver code/func/proc

Do you mean once we add the messageHash attribute? Afaik existing tests shouldn't be affected, unless there are already some tests related to this attribute.

DB migration script for the introduction of new messageHash attribute

This will just add the new column, but since it may become the primary key and should be both unique and NOT NULL value, we should come up with something simple to backfill messageHash for historical entries - perhaps a simple counter will do? This won't be an addressable key, but since these entries will become obsolete within max 30 days for our longest store that's not really a concern. We should just not break the uniqueness and not null constraint.

richard-ramos commented 8 months ago

This will spill over to goWaku who have already made this change so they also need to reverse this.

Change reverted in this commit https://github.com/waku-org/go-waku/commit/cf82f66d12ed84550722f8da156a9fb39a85114b

ABresting commented 8 months ago

This seems reasonable. Since the RFC uses the term message_hash I think it's fine to continue renaming the waku core module from digest to computeMessageHash.

where messageHash is used as a part of put driver code/func/proc

Do you mean once we add the messageHash attribute? Afaik existing tests shouldn't be affected, unless there are already some tests related to this attribute.

Yes upon adding the messageHash attribute we need to change some test cases where there is a put driver trying to inject a message in the database. The nature of the test cases will not change just some input parameter related changes.

DB migration script for the introduction of new messageHash attribute

This will just add the new column, but since it may become the primary key and should be both unique and NOT NULL value, we should come up with something simple to backfill messageHash for historical entries - perhaps a simple counter will do? This won't be an addressable key, but since these entries will become obsolete within max 30 days for our longest store that's not really a concern. We should just not break the uniqueness and not null constraint.

Indeed we should use a counter and not leave messageHash as NULL value.

Ivansete-status commented 8 months ago

Thanks for the detailed description @ABresting :100: !

Instead of renaming things in digest.nim maybe it is better to create a new message_hash.nim file, and add a comment in the digest.nim file stating that it will get deprecated soon. Then, we will remove the digest.nim once we deprecate the id field.

Let me also add an interesting point, extracted from @jm-clius's comments in a 1:1, about the message_hash attribute (maybe already considered somewhere:)

The new Sync protocol, or a new version of the Store protocol, will allow Store peers to ask among them <<Please, give me the message_hashes from a certain period of time>>. Then, the nodes could retrieve the missing messages based on the missing message_hash set.

This idea of key-value storage connects very well with the aim of bringing Redis as an intermediate layer between the Waku node and Postgres.

jm-clius commented 8 months ago

Instead of renaming things in digest.nim

In this case, if we have decided to change the terminology I would just go for it. It's good to keep backwards compatibility, but only if some external clients depend on this or if we extensively reference this elsewhere in the code and changing it will obscure other changes. digest.nim is not currently used anywhere (except for some logging), so we'll just create dead code if we keep the old module (the id and digest we're planning to deprecate is part of the archive module, not core).