Closed yusefnapora closed 7 years ago
Does indexer have a rockdb so far?
This needs corresponding requirements.txt change
the testnet indexer has rocksdb installed, yes.
should probably put it in there, then?
yeah, I haven't so far since we need to coordinate with @autoencoder - it's only needed for the blockchain stream, and I didn't want to break any other deployments he's got :) But it would be cool to have it as a hard requirement, since the blockchain catchup will be memory hungry and transient without rocksdb installed.
Ok deferring to @autoencoder then, would also be ok to drop it into a ~/.mediachain or w/e
yeah, the rocksdb blockchain cache gets stored in ~/.mediachain anyway, so we could just write the block ref to a file there.
I'm down to just drop a file then. Do we need any locking semantics? I guess not given that only 1 thread would write, though now that I think about it we probably need a pidfile/lock for the whole thing
yeah, at the moment we shouldn't need locking, since we'll just read once before we start tailing the blockchain, and only write from one thread. but it might get more complicated as time goes on if we end up doing some kind of parallelization, etc
Yeah, we should probably drop a pidfile and refuse to run if another process is present for now
For the current block count, may end up sticking that counter (edit: probably more of a transaction ID) in ES, since the contents of ES would be the reason for the Indexer to be tracking this number.
Sounding good. Will take a closer look / merge in the AM.
@autoencoder the thing is, we can't easily do a simple block height counter without changing the transactor RPC api to include that. At the moment we just have a ref to the block, which isn't ordered at all.
We could keep our own count on the client, but that would get tricky with the partial catchup, etc. Adding the block height or some kind of sequence number to the API probably makes sense
actually, I guess we could pull the index
of the first entry in the block, and use that as a sequence number... will think about that some more
Ideally it'd be an ID which could be later used to identify exactly what position of which fork the chain was on... and then if the API caller later tries to resume from a position that was on an abandoned fork, the client API would the replay the necessary inserts / updates / deletes into the Indexer to get it back in sync with the proper fork. Something like that.
Maybe not needed yet.
I echo @parkan for having a lock/pid file for the local block cache. Btw, is rocksdb safe for concurrent processes?
Ok, took a look. Looks good. Noting some of the parts still WIP:
yeah, it probably does make more sense for us to track the current block in rocksdb in the client code. I can set that up and see about exposing the index
so we can use it as our "block height".
We could do multi-process catchup by opening the block cache in read-only mode; that's a good idea. It would need a little bit of extension to the current BlockCache
api. Right now the BlockCache
is just a read-through cache that doesn't keep track of block ordering, etc. So there's no way to "seek" to a particular block; instead we're always walking back through the chain from the current block. But I think if we track the block index numbers we should be able to spawn multiple processes that each take a range of blocks.
We'll also need to track the particular blockchain that the blocks are part of; it would be nice if each chain had a unique id or genesis block ref or something that we could identify the chain with. Right now the block cache will store blocks from any chain, since it's just a K/V store. But if we want to keep track of the structure / sequence of blocks, we also need to differentiate between different chains and either store them in separate rocskdb instances or prefix the keys, etc.
This updates the SimpleClient and
receive_blockchain_into_indexer
functions to use the newcanonical_stream
output format from https://github.com/mediachain/mediachain-client/pull/88I added a cli flag so I could test catching up to a known block (
--last-known-block=QmF00...
), but plan to remove that soon. It seems like we should be writing out the last known block ref somewhere so we can read it in after a restart.@autoencoder, where do you think that value should get stored? Could just spit it out to a file somewhere...