marceloboeira / voik

♒︎ [WIP] An experimental ~distributed~ commit-log
MIT License
255 stars 25 forks source link

Record Transport & Serialisation #6

Open marceloboeira opened 5 years ago

marceloboeira commented 5 years ago

We should start looking into the tradeoffs of different types of network implementations.

I see 2 main roads we can take, we can either (1) develop our own "protocol" on top of TCP, or (2) use something similar to what Kinesis has, an JSON REST API on top of HTTP.

1 - Pure TCP - which probably would be more performant and efficient than the other alternatives. The downside is that is a more low-level implementation, it would most-likely require a binary protocol for communication and it would be more demanding on client-libraries.

2 - HTTP - Easier to implement, easier to integrate with client libraries, probably less efficient (serialisation/deserialisation) and slower.

There are mixes that can be good option too, like Binary protocols on top of HTTP...

Important - Things we should take into account as well

  1. zero-copy - If we want to use system sendfile for costless disk-to-network interfacing, we might have to use Pure TCP with binary protocol. I'm not certain of that tho.
  2. Pure TCP might be more useful when using the stream for multimedia related files, lets say for streaming video, you probably don't want the data to be embedded into a JSON and have to go through a very slow JSON encoder/decoder.
  3. Binary protocols make harder to change API and to send metadata related content on the API response as well. e.g.: cursor/latency information...

Acceptance criteria

At the end of this task, we should be able to write & read messages throughout the network,

cat record.json > http post :7001/commit-log/ 

or even

telnet 7001
.... binary content
boaz-codota commented 3 years ago

Let's do this!

Maybe we should split the responsibilities? Leave all of the serialization/encryption/security concerns to a different service that should be ran together with the log service, and just let the log service use sendfile on anything coming?

This will allow the log service to be throttled only by the filesystem, while the other service (gateway I guess) would be throttled by cpu.

This can also allow different "flavors" of gateways to run next to a single log service. Meaning the gateway could be Kinesis compatible or Kafka compatible or anything else.

Will love to hear what you think, and know we can get that rock rolling