xloem / openrealrecord

Streams binary data: immutable, censorship-resistant, provable, authenticated, decentralized, logged
4 stars 3 forks source link

Break functionality into Modules #5

Open xloem opened 6 years ago

xloem commented 6 years ago

I propose that each kind of stream is its own module, with some metadata that provides information about, parameters that can be passed, and a way to run a process producing it. Use cases to consider for modules:

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/58360048-break-functionality-into-modules?utm_campaign=plugin&utm_content=tracker%2F95042688&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F95042688&utm_medium=issues&utm_source=github).
yuriy-yarosh commented 6 years ago

• audio/video feed

@xloem you'll need some dynamic network topology and congestion control for a real time streaming. Look through RTMFP proto RFC details. Hypercore's proto might be insufficient at some point, and I'm a bit sceptical about it's security measures.

RTMFP by itself is not a good option because it has some Adobe royalties involved regarding flash, because Adobe, although it could be a good design reference.

I've had a run-in with a lot of p2p technologies kicking around, and it looked to me like gnunet had the most mature tools.

Haven't seen any viable P2P streaming stuff there ...

Feel free to convo me, if you've got any questions regarding reliable real time streaming.

https://keybase.io/yarosh http://t.me/YuriyYarosh Skype: void.nugget void.nugget@gmail.com

xloem commented 6 years ago

@yuriy-yarosh my highest priorities are to make it easy to produce incredible amounts of data that cannot be forged, altered, or censored, that can give high confidence that what is recorded actually happened, and to make it easy to access the data and verify all of those things. Lag-free real-time streaming on a massive scale wasn't something I had considered yet, although I can see a lot of value in that too. I feel hypercore's structure is pretty close to allowing for such streaming.

Note that secushare is built on gnunet and uses multicast to provide one-to-many pubsub streams.

I'd love to hear your concerns regarding hypercore's security ! My biggest concern is the language Javascript itself; but I feel it is the protocol and data representation structures here that are important, which are language-agnostic.

My interest in providing a gnunet transport stems from noticing that gnunet has a lot of options for handling funny network situations. For example it can bootstrap without any hard-coded peers nor contacting any central servers, and the transport architecture allows for sneakernet.

If there's something specific this project needs that you'd enjoy spending the time to make high quality, I'd be happy to put another bounty on some issue you post.

yuriy-yarosh commented 6 years ago

@xloem what do you think about moving to MsgPack and/or Node Streams, some modern JS syntax ?

I'll review hypercore's proto soon(ish).

My biggest concern is the language Javascript itself

With Node10 performance and FFI stuff should be simpler, although I'd implement projects like Hypercore / OpenRealRecord using Rust or Swift.

One-To-Many is the simplest topology, for lag-free XP you have to intorduce completely dynamic and redundant one. Most of Existing streaming companies failed with this task.

xloem commented 6 years ago

MsgPack and/or Node Streams, some modern JS syntax ?

Sounds great. I keep meaning to figure out the newest node release that is supported out-of-the-box by current operating systems and pick a style that matches whatever that is. I started using classes because bcoin became a dependency.

I threw protobuf in there because hypercore/hyperdb use protobuf. The consistency seems valuable, and reuse reduces the number of dependency modules. I haven't evaluated the choice otherwise and am open to msgpack being better (and am curious why).

implement projects like Hypercore / OpenRealRecord using Rust or Swift.

I'm inexperienced with Rust but it looks great to me. There's some work to port hypercore to rust but it doesn't look like it's moving very fast. Ideally I'd prefer C or barebones C++ myself, for the smaller microcontrollers like arduinos that all have working gcc toolchains but no OS. Obviously this project uses node just because hyperdb does.

yuriy-yarosh commented 6 years ago

open to msgpack being better (and am curious why).

From performance standpoint there are no benefits - msgpack has some overhead due to embedding schema directly. From long term support perspective there are some benifits - embedding schema and working with plain old JSON objects simplifies testing, development and Proto versioning. There might be less boilerplate code involved with msgpack, depends on developer ofc.

I'd prefer C or barebones C++ myself

C is fine. C++ has way too many pointless standards nowadays, thus causing some long term development hassle. I've been supporting some C++ nuclear reactor management software for a while, it's not pretty.

@xloem I'll create another issue regarding project cleanup and re-struct, feel free to post a small bounty (200-300$) if you'd like.

xloem commented 6 years ago

msgpack has some overhead due to embedding schema directly

That sounds a little concerning -- does msgpack have a way to enforce a tight schema, the way protobuf does? Also note that every update contains a protobuf-encoded message; including the schema in every update would make them significantly heavier bytewise.

C++ has way to many pointless standards nowadays

Rust is a relief. Wish I had the time to learn it well.

I'll create another issue regarding project cleanup and re-struct, feel free to post a small bounty (200-300$) if you'd like.

Sounds great. It would be nice if you could work off the wip branch, which was moving towards adding metadata and modularity.

yuriy-yarosh commented 6 years ago

That sounds a little concerning -- does msgpack have a way to enforce a tight schema

Usually people using typed JavaScript for that.

significantly heavier bytewise

Plus 1-2 bytes per struct field.

could work off the wip branch

Roger.

xloem commented 6 years ago

I tried encoding a checkpoint {"rootsHash":"1234567890123456789012345678901234567890123456789012345678901234", "timestamp":1526904421, "length":1023, "byteLength":16777215} using both msgpack and protobuf and found the msgpack result to be 47.5% larger (38 bytes for 4 fields).

I'm not convinced regarding msgpack; I think it might be better to version the database formats and allow for upgrading, than to use a serialization format that allows for dynamic content. I feel sticking with protobuf would create a more well-defined database that is easier to review and verify and fails more automatically in the face of misuse. (e.g. if somebody slaps together a custom client)

I hear you that typed javascript would help these goals. I haven't seen it used a lot, but it sounds like a wiser choice if support is widespread nowadays.

Could you create that issue? I don't want to step on your toes by jumping ahead, but this conversation might belong there better =)

yuriy-yarosh commented 6 years ago

msgpack result to be 47.5% larger

That's because it's treating JavaScript objects as maps and encodes keys too, which actually helps with versioning a bit, because you don't have to hardcode Version value handling, just simple edge cases like what should be done when the combination of fields present.

I've seen some companies, and their respective ukrainian outsource/outstaff counterparts, moved from GRPC to http/2 with msgpack for exactly this reason.

ofc on larger scale 38 bytes would matter. @xloem it's up to you, but I'm fine with sticking with protobuf.

xloem commented 6 years ago

I'm still leaning towards sticking with protobuf. I don't want to stay Javascript-only forever, and I appreciate how protobuf enforces the schema in all languages already. Additionally, I feel including the whole schema in every message will make lightweight streams and messages unnecessarily heavy: a 400gig long-term log of household temperature history would become almost 600gigs with msgpack. Choices with that much impact can add up.

bookmoons commented 5 years ago

Amazing project. The world needs this.

Opened pull request #11 as a preliminary step toward this.

bookmoons commented 5 years ago

Reposting from the pull request comment since this is really the place to discuss modules.

We'll need to see if I've understood modules in the modules PR, but just to indicate where I'm heading. I have a general Module class that can be subclassed to define module types. And a proof of concept IterableModule that feeds an Iterable to a stream:

const { IterableModule } = require('openrealrecord/module')

const stream = createStream()
const data = [
  Buffer.from('1234', 'hex'),
  Buffer.from('5678', 'hex'),
  Buffer.from('ddff', 'hex')
]
const module = new IterableModule(stream, data)
module.start()
xloem commented 5 years ago

@bookmoons thanks so much for your involvement! So you know, I started implementing a module class long ago in the wip branch, which I will likely be trying to merge with your pull request eventually.

xloem commented 5 years ago

@bookmoons to clarify our different views on what a module is, I was thinking the module abstraction would be visible to the user of a UI: so my focus was on modules that managed different external things, such as video or a system log. A connecting point between the modules was the concept of metadata that would allow the user and system to determine the meaning of each feed.

Of course, backend modules are required for frontend modules. A good first module would be specializing the behavior of the CLI client --pipe mode, which reads and stores from stdin as a long-running process.

bookmoons commented 5 years ago

I see what you mean. Modules of functionality accessible in the interface.

Maybe I'll try implementing a stdin module to see what comes up along the way.

bookmoons commented 5 years ago

There's an experiment with a module that has interface. It reads from stdin and splits into lines. Maybe the kind of thing you'd do if you were streaming a server log.

I have it printing lines for testing:

[user@bookmoons bin]$ cat test
test
test2
test3
[user@bookmoons bin]$ cat test | ./hyperstream --pipelines
Generated an in-memory database.
Database: jOkqoPexyo7XALJOeTBsWxsmu8HWXBmyQH5LMoT26iM
User: jOkqoPexyo7XALJOeTBsWxsmu8HWXBmyQH5LMoT26iM
1 Streams:
  jOkqoPexyo7XALJOeTBsWxsmu8HWXBmyQH5LMoT26iM { ERR stream has no name } 0B
<Buffer 74 65 73 74>
<Buffer 74 65 73 74 32>
<Buffer 74 65 73 74 33>