Closed abourget closed 9 months ago
Any errors, sent to stderr
Most of the time stderr
is used for logging purposes, how do you see error being conveyed there? Or maybe that's exactly how to see it, print/log errors to stderr
?
aptos
was one that had the inverse (logs through stdout
), this created some issues, FYI.
So we ask them to OUTPUT that stream through stdout, with length prefixed, or line-based in B64
One thing I noted when doing Ethereum refactoring to a single line output was that having at least the block number somewhere on the line helped (a bit) with debugging since you can more easily see for which block number the line is.
The order doesn't matter, but all the specs:
Block cannot be <= to LIB
Implement the node-manager layer and deal with those chains as to how to do that layer. A few scripts? A small program?
I do not understand this one, aren't we define the spec needed for a program to be read by an agnostic reader-node
app?
What about the necessary kill cycle when our node wants to make a backup?
That can be specified in the backup spec directly and probably makes even more sense there as one spec could need a stop and not there other one so the fact that a restart is needed feels to me that it should be part of the backup spec.
ok, addressed your comments above and updated the post, can you review? thanks for the input!
We should add:
reader
, the meaning of a reader_version
value should increase when the reader code has changed some meanings in the values, or improved its data extraction in a way that breaks with previous runs of that reader.
chain_id
field at the top of our blocks, when there might be different networks.
chain_id
field within the blocks.reader_version
:P)Add?
FIRE INIT [READER_PROTOCOL_VERSION] [DATA_VERSION] sf.ethereum.type.v2.Block
Hmm.. the [DATA_VERSION] ought to be inserted within the type.v2.Block
.. and have the payload carry the data version within. So we don't need to have it on the INIT
line..
Do we have an expectation that upon restart, the reader
program would continue where it left off? That's the behavior with geth
right now.. it depends on the state under the reader, and it is not piloted by the node-manager
stack.
Do we have an expectation that upon restart, the reader program would continue where it left off? That's the behavior with geth right now.. it depends on the state under the reader, and it is not piloted by the node-manager stack.
That is correct, the "reader-node" program is expected to start back at the very next block.
Reader-node starts at the very next block --> for readers that don't do this by design (ex: poller sucker), a recommendation would be to have a small "cursor" file on disk, that includes last block and LIB (so that the poller sucker will know if it needs to go back a few blocks because the last read block got reorged) That cursor file could be replaced by the user to contain a single block number, ex:
echo 123455 > cursor
and it would restart from there. simple, user controls his "node".
Regarding backups and node-manager expectations:
backups:
localhost:8080/v1/last_block_number
on the manager http server)node-manager expectations:
Note Things that we "lose" by going down this road:
That cursor file could be replaced by the user to contain a single block number, ex:
echo 123455 > cursor
and it would restart from there. simple, user controls his "node".
The arweave firehose poller-sucker works like this already with a small cursor file.
- If the user needs some logic pre-start or pre-stop, it can give a "script" to run instead of binary+args, and handle whatever he wants in there, that's his option.
We do this all the time.
For those last two notes, here are strategies that allows an expansion and blockchain agnostic core to Substreams and Firehose:
The tool download-from-firehose
needs a ToProto, because it extracts some values from in there, to reconstruct a bstream.Block
, out of the data in the ...ethereum.type.v2.Block
. So this will pose problem for generization for that particular tool, as it needs to understand that block.
To handle download-from-firehose
, we need to add the parent_num
(or parent_block_num
) to bstream.Block
.
Also, the sf.firehose.v2.Response
, add a few fields contained within the bstream.Block
as top-level. This way, the download-from-firehose
can reconstruct the bstream.Block
on the other end, in a generic way.
Layout of chain-specific repo:
Ex: streamingfast/firehose-bitcoin
README.md
- Reader in `nodeos --firehose-enabled`
- Proto def in solana/source-code/confirmed_blocks.proto
- Configuration to boot:
firecore start --reader-node fireeth
firehose-bitcoin-btc-reader/main.go VERSION
firehose-bitcoin-tools/main.go
pkg/polling-implementation.go
wasm-extensions/[eth_calls] VERSION
substreams-crates/src/bitcoin-decoding-stuff.rs VERSION
tools/[non-generic-tools]
substreams-explorer/
substreams.yaml -> `extract_firehose_blocks` produces an `.spkg` VERSION
proto/sf/protocol/type/v1/block.proto VERSION
Releases can be in sync with the tag of the repo. Some releases might not include all of the pieces in there (maybe you don't release the spkg
if the tools
are updated with a new tag..)
Some work done to genericize:
bstream.Block
takes in the proto
change.develop
branch has some Reverted commits that implemented the protobuf fully qualified message name instead of some random three letters word. Is this necessary?We can remove the payload_version
from the bstream.Block
, and remove the checks for acceptance of that version.
We have decided to go with type.v1
and type.v2
when bumping largely the versions of the Ethereum Block for instance. And we have a Ver
within that Ethereum block, for knowing the content revision, taken from the FIRE INIT
, and interpreted by the Reader, or simply produced with a certain revision of the reader node (hard-coded in the reader version when it acts differently).
This will allow stats to be gathered at the level of the READER layer, and bring that into firecore
. We can extract the throughput metrics from firehose-ethereum
and bring them back into a generic firecore
.
These are specifications for enabling a chain to be supported by the Firehose and Substreams stack by StreamingFast.
It is a description of the
reader
node or program in the architecture diagram of Firehose (alternatively calledextractors
orfirehose-enabled nodes
).This
reader
implementation does not presuppose any extraction method detailed here: https://firehose.streamingfast.io/integrate-new-chains/integration-overviewOnce this is implemented for a chain, the Firehose history should be processable and a real-time Firehose can run live. Subtreams can also then be served on such a network.
This does not include any Substreams-specific extensions (like the Ethereum
eth_call
support), but allows for a generic use offirehose-core
without modifications.Program behavior
Input flags
This program is free to adopt any flags necessary.
--firehose-enabled
).--rpc-endpoint http://localhost:8545
)Output streams
The reader program should output data through standard UNIX output streams. It is conventional to use
stdout
for the block data, andstderr
for any logsA program that outputs a stream of blocks, in order, or out of order, encoded as
bstream.Block
s.bstream.Block
so it uses anany.Any
field instead of our payload thing.bstream
and us to bumpbstream
insidefirehose-core
each time a new integration comes.dbin
tweak, with backwards compatibility? So files could convey their contents duly, with the fully specifiedstdout
, with length prefixed, or line-based in B64, or with a defined method that would work across all protocols.FIRE
, followed by a block number to help debugging, or even have the fields frombstream.Block
on the line.stderr
andstdout
, but the reader can be booted with eitherstderr
orstdout
to pick-up theFIRE
lines, anything not starting with FIRE would be ignored anyway.block_id
andprevious_id
block_id
/previosu_id
in the endany.Any
fields must be deterministic across execution clientsgettimeofday()
The underlying program can be written in any language. Our
reader
would then run that program and it does whatever it wants.node-manager
now handles the above spec.What about the necessary kill cycle when our node wants to make a backup?
firehose-chain
altogether, do your thing, and then restart the whole process, and boot the program again?READER_PROTOCOL_VERSION is the spec of those two lines here ^^ This version is
3
.