Closed Frando closed 4 years ago
This is super cool @Frando. I'm still thinking through everything, but here are some initial questions:
pull()
implementation. However, this is a bit tricksy, since the source would only be able to store state during the part of control flow where the view has not yet indexed the messages. This makes it hard for a source to update its state right after the view processes a batch. I think it'd be cool to see an example of a source that uses disk storage, to look at together & think through the implications of this.bitfield-db
), maybe the pull()
api can just be pull(next)
, and we assume it manages itself.var kappa = require('kappa-core')
var hsource = require('kappa-source-hypercore')
var tinybox = require('tinybox')
var raf = require('random-access-file')
var bkdview = require('kappa-view-bkd')
var level = require('level')
var core = kappa()
var src = hsource(tinybox(raf)) // stores `version` and `state` in a random-access-* store
var view = bkdview(level('./foo')) // store just the spatial database details
core.use('spatial', src, view) // hooks up the source and view instances
core.api.spatial.query([-40,40,-80,80], (err, res) => { /*...*/ })
btw, kappa-core is on hypercore-protocol@7
now! :guitar:
So, now in some more words. I agree to noffle's remarks!
I started to update the Kappa5 based on these observations. Before I continue I think I'd like for us to agree on the end result so that its not too much work rewriting things again.
Currently, a most simple example would look like this:
https://gist.github.com/Frando/21bc9e796544692b51de7e85edd1983a
Things to note:
Each use
call creates a Flow
, which is the combination of source + view. This makes things explicit, which is good. Its up to the consumer to create a source many times if it needs the same source for many views (thats how it always was, just happening inside kappa-core). I think I like this, and this also opens the door to possibly optimize for "one source for many views" scenarios
Both views and sources can expose an api. The view's api is mounted on kappa.api
, the source's api on kappa.api.source
. Is this good, or should this be structured differently?
One thing I'm still not totally sure is how the source can talk to its flow to request that pull
be called again. Right now I pass the flow object into to open
method, where it can then be stored somewhere, so that when the source has incoming messages, it can call flow.update
to signal that its pull method should be called. Before (in the current kappa5 branch), it was passed into the constructor (there, the createSource constructor is called by kappa-core, now a constructed source would be passed in by the consumer - which is nice because its the same as with views).
I started upating the API after the discussions.
See https://github.com/Frando/kappa-core/tree/kappa5-new for now. Most tests are updated and pass.
This is continued in #14.
This PR pulls in the current state of my kappa5 branch. It has been talked about a bit already: It changes kappa-core to be dependency-free (it just connects sources to views) which should make it much easier to support flexible indexing flows and scenarios. And then it includes sources for hypercore and multifeed (and hyperdrives!).
This is not completely ready yet i think, but I wanted to put it up for review and discussion, and to agree on the best way on.
README with the new API
Open questions and missing features
State handling. In current kappa, the views need to provide storeState/fetchState handlers if they want to persist the indexer's state. I don't think this should stay with the view, as now a view can have many sources. There's two parts of state here: 1) the view version to trigger a rebuild on a version change 2) the indexing progress. For 2) a more complex (sparse) indexer/source would track the progress on its own (eg in bitfields), a simple source could still make use of a buffer to store its state.
So my current thinking is: Have the kappa track the state for view versions, and a buffer per flow (source instance-view combo), by default in-memory. And allow to supply
storeState (key, state, cb)
,fetchState (key, cb)
opts to the kappa-core. We could then also ship eg a simple implementation with tinybox, then only a random-access-storage instance would have to be passed into the kappa for persistence.Naming: do people like the
source
term here? I was thinking ifindexer
would be better, but am not sure. Like, thecreateSource
function does create a source for the kappa, which usually is a function that indexes a set of feeds or other datastructures. So creating a source for the kappa does usually not create datastructures, but only the function that indexes them.Backwards compatibility: Currently there is the
kappaClassic
function inindex.js
that wires the new kappa-core together in an API-compatible way to the current kappa-core. I mostly did this for testing - it passes thecabal-core
tests. However, this is based on current multifeed, which means hypercore 7. So actually, I'd propose to not have that, and have a backwards incompatible change.What to include in kappa-core? Should
kappa-core
be just the kappa-core, or also include a set of useful sources? (the modules in/sources
)