RFC: Declarative Indexes

aboodman commented 3 years ago

This was @arv's idea originally, but I like it...

Right now we allow you to create an index anytime you want with createIndex and drop them with dropIndex. Notably we provide no way to interrogate which indexes exist.

What if instead, you had to specify indexes as parameters to the constructor:

const rep = new Replicache({
  pushURL: ...,
  pullURL: ...,
  mutators: {
    ...
  },
  indexes: {
    byTitle: {
      jsonPointer: '/title',
      prefix: 'foo/',
    },
    ...
  }
}

Replicache would ensure those indexes (and only those) exist at startup.

I think in many cases this might be easier for developers because what they really want is a Replicache instance that guarantees certain indexes exist. Right now if they want that they have to keep track somewhere of what indexes currently exist, drop any unnecessary ones, and create new ones.

aboodman commented 3 years ago

This would solve #601 as a side-effect I think.

KeKs0r commented 3 years ago

I think that API makes sense, since it allows RC to internally manage the index creation and ensure they exist when the first pull data is loaded. From a user perspecctive, it would be very similar, since I call create index right after creating the replicache instance.

I have not yet dropped indexes, but if also that part would be taken care from replicache, it would make index management much easier.

aboodman commented 3 years ago

This brings up the question though, of what should happen during the delay when an index is getting built with this API. Should reads and writes be queued until the index is created, or should there be an event that says when the index is ready.

It seems like generally, developers won't want the app to boot until the index is ready, but OTOH, I'm a little weirded out forcing that on them.

Perhaps, instead, it should go something like:

await rep.initIndexes({
  byTitle: {
    jsonPointer: '/title',
    prefix: 'books/',
  },
  byAuthor: {
    jsonPointer: /author',
    prefix: 'books/',
  },
});

Then developers can await the indexes being ready, or not.

There is also the problem of tabs. Always tabs. We can't just delete indexes not required by one instance of Replicache because another instance might still be using them. This is a problem even for developers using indexes now, which argues more strongly for doing something first-class here.

KeKs0r commented 3 years ago

There are good arguments for both. Exposing it allows the developer to handle the loading state manually, whereas queueing would end up being more fine grained, and allows the overall app load faster and then have only loading state longer visible for particular queries that use the index being built. I tend to prefer option 2, since it allows more granular control and speeds up the bootup time, for the overall app. How long does an index build take in different setting? If it is an acceptable delay option 2 would just seem like a single slow query which would be fine. If it generally can take some longer time, the developer would want to handle it globally to indicate something "bigger" is going on.

arv commented 2 years ago

We can probably just have the index scans await the indexes

aboodman commented 2 years ago

There is also the problem of tabs. Always tabs.

For the record, this was fixed with the introduction of realtime storage in v9. The current API works correctly in the presence of tabs: index mutations, including drops, work on the client-specific fork of storage just like any other mutation. They get carried into the snapshot commits, and then later when a new client forks, it forks starting with those indexes too.

aboodman commented 2 years ago

This was implemented and will be in next release 🎉. It was required for https://github.com/orgs/rocicorp/projects/4.

rocicorp / replicache

RFC: Declarative Indexes #602