ssbc / ssb2-discussion-forum

not quite tiny, also not quite large
17 stars 1 forks source link

BIPF #3

Open arj03 opened 1 year ago

arj03 commented 1 year ago

From #2:

given the current light of re-evaluating things, have you reconsidered dom's ltp? mentioning cause simpler deps means more likely to be (correctly) implemented in other languages.

and maybe a naive question but why does the feed format have to specify the storage format? so: why not do bipf on messsages after receiving, rather than before sending. gives you flexibility for changing it later on, and imposes less constraints on implementations / database choices. so i guess my point is "specify the bytes to send, not how to store them" but then again i've got cable on my brain

gpicron commented 1 year ago

You can add FlatBuffer for schema based format and CBOR (which is very near to BIPF) and FlexBuffer for Schemaless ones.

Your question is not naïve and I can give you a partial answer. I made the analysis for a private company for an app based on react native and a server distributing mainly textual content. Initially it was a nosql mongo db, nodejs for server, grpc (protobuf) on wire and Electron/react for the app.

When they ask my help, they were struggling with scalability and apps were draining energy on mobiles. After initial analysis, this was not apparing as a DB IO problem as in most cases. They had a cache layer on top of the DB that was efficiently used.

Tracing the path end to end and profiling the code, I found that most of cpu time was used for transcoding data. By importance

  1. UTF8 <-> UTF16

    • In the DB, the string are in UTF8.
    • In the JS code of the server they are in UTF16.
    • In the Protobuf, UTF8
    • In the JS code of the app, in UTF16
    • In the HTML generated, in UTF8 (at maybe again conversions depending on the WebView impls on some mobiles)
    • note: in their context very international, non-ascii char was more the rule than the exception. Reason why probably it was so meaningful. UTF8/UTF16 conversion impls are usually very efficient but implies branching that cpu branch predictions very good for ASCII and but much less with text using often higher unicode pages.
  2. Model conversions:

    • MongoDB is BSON <-> JS Objects <-> Protobuf <-> JS Objects <-> HTML views.
    • The garbage collectors were spending most of their time on collecting garbage from those conversions.

At the end, we used FlatBuffer. Data is stored in a MySQL DB as blobs and some function index of some data from the buffer. Some filtering (authorization) is done in the server side on some string properties in the buffer directly in UTF8 using memcmp like comparisons (instead of transcoding to JS String and comparing strings). FlatBuffer on the wire (about 5 % additional bandwidth) On the app side, some sorting is done based on user parameters, so it get some specific fields value to sort the arrays of buffers, direct write of bytes of the strings from the Flatbuffer buffer to the HTML view buffer.

Result: On the app side, CPU and Memory usage was divided by 8. The app is much more reactive and not anymore draining the battery. On the server side, CPU and Memory usage divided by 15... No more need to elaborate distributed system architecture for their use cases scales.

I don't know how much our stacks are affected without measurements but as matter of fact we are making a lot of serialization/deserialization in JS which is CPU consuming (garbage collector and UTF conversions). So more than trying have the same wire format and db format, I would try to use serialized format to serialized format direct conversions to avoid string UTF conversion and rely mostly on memcopy. That should have a meaningful impact on CPU usage. (note: some languages/platform are less impacted by the UTF8 fogs. For instance, V8 will keep encoded ASCII only strings as OneByteString which is same as UTF8 and else in UTF16 (called TwoByteStrings). But deciding which one of the variant to save memory, implies testings and branches. Other languages like Go use UTF8 as internal encoding, so that it is a simple memcopy when deserializing a UTF8.

Ultimately, the zero copy approach if you a have layered network protocol and no transcoding is to aim at just instructing the OS IO stack to transfer some byte segments from file to socket (sendfile) so that all is done in the kernel without copy in the app memory space. Another option is to use Linux Direct IO to access the disks and something like UCX for direct access to network cards, so that you bypass the kernel. But that's rather complex and as you bypass the kernel you lose the caches, kthread scheduler and have to take care of that yourself. That only useful for implementing things like virtual network switch or virtual machine drivers I think.) . In between playing the new io_uring (linux), kqueue (osx) and IIOP (windows) offer very interesting perspective but less portability (meanwhile there are some interesting abstraction bindable in nodejs, java, c etc. like the one of tigerbeetle.

arj03 commented 1 year ago

Very interesting story. Thanks for sharing.

Some filtering (authorization) is done in the server side on some string properties in the buffer directly in UTF8 using memcmp like comparisons (instead of transcoding to JS String and comparing strings).

This is similar to what we do with BIPF where we operate directly on the buffer values without parsing.

FlatBuffer on the wire (about 5 % additional bandwidth)

I did an experiment where I encoded the metadata (including signature) of minibutt in bipf and protobuf, bipf had a minor overhead mainly because of the types compared to raw, but protobuf was significantly smaller. Some of that might have been because of the first message so previous was null, but still that is only 16 bytes. From 147 bytes to 87 bytes, so protobuf must use some kind of compression as well. I'm still suprised they can get it that far down on 1 message.

staltz commented 1 year ago

By the way, somewhat unrelated, somewhat related, when working on ssb-memdb I realized that since I was loading up all messages from disk (BIPF) to memory (JS objects), I was doing a full bipf.decode which misses the point of BIPF, which is in-place reads. And I remembered that full decode speed of JSON.parse is 3x that of bipf.decode, so I swapped the log format for JSON, and indeed the startup time for ssb-memdb got about 2x–3x faster, because it just does JSON.parse.

gpicron commented 1 year ago

I did an experiment where I encoded the metadata (including signature) of minibutt in bipf and protobuf, bipf had a minor overhead mainly because of the types compared to raw, but protobuf was significantly smaller. Some of that might have been because of the first message so previous was null, but still that is only 16 bytes. From 147 bytes to 87 bytes, so protobuf must use some kind of compression as well. I'm still suprised they can get it that far down on 1 message.

Protobuf is really the state of art in term of compressed message format in term of size. The drawback is that you have to deserialize the whole buffer to validate/read it. That's why more and more project are moving to use FlatBuffer and also why Google started to support gRPC api with FlatBuffer. It cost a bit more bytes on the wire, but the benefit on CPU is important. In most case, the small win in bytes that protobuf provide are lost due to packet padding and/or encryption paddings anyway.

gpicron commented 1 year ago

By the way, somewhat unrelated, somewhat related, when working on ssb-memdb I realized that since I was loading up all messages from disk (BIPF) to memory (JS objects), I was doing a full bipf.decode which misses the point of BIPF, which is in-place reads. And I remembered that full decode speed of JSON.parse is 3x that of bipf.decode, so I swapped the log format for JSON, and indeed the startup time for ssb-memdb got about 2x–3x faster, because it just does JSON.parse.

Another option is to keep it coded in BIPF (or Flabuffer/Flexbuffer) in memory. JS Objects Memory segments are spread within the heap. As you are traversing the array linearly and filter when queyring, it is more efficient to have a contiguous segment of memory. The CPU preload next memory page in CPU caches in parallel to performing operations on "current". It can do it when stream of data is in contiguous memory but not when more of less randomly spread in memory by the VM memory allocator.

Note: you probably yet benefit partially from that because you load all at start so there is a good chance that the allocator allocate new segments next to each other with limited garbage in between to store the objects in memory and so that your JS objects are near each others.

staltz commented 1 year ago

Interesting! Thanks for the insight Geoffrey.

Based on what you two are saying, I'd like to try FlatBuffers in ssb-memdb too, because less size in disk (per msg) means storing more messages within that 100MB budget.

You are right that I could load the messages as buffers (BIPF / FlatBuffer / Protobuf), but I'm also looking for a good developer experience of reading the fields of messages. In an advanced scenario, we could just use JS Proxy as a wrapper around the Buffer, such that you can do msg.value.content.text and under the hood it's going to translate that to a sequence of buffer offsetting and decoding. But that's fancy stuff for the future, for now I'm fine with the naive approach of loading it all as JS objects in memory.

erikmav commented 1 year ago

so protobuf must use some kind of compression as well. I'm still suprised they can get it that far down on 1 message.

Protobuf is really the state of art in term of compressed message format in term of size. The drawback is that you have to deserialize the whole buffer to validate/read it.

There's no explicit bzip style compression in Protobuf. Part of the wins are that there are no field length values, and integers are always variable-length, which together save quite a bit, at the cost of having to traverse various unordered field tag+value variable-length pairs to get a pointer to a string or byte region within the encoded message. In a couple of projects I've seen custom serializers and deserializers for Protobuf to get performance by bypassing the general-purpose generated code, but that's relatively rare., and costly to write and maintain.

It is possible to write partial and incremental deserializers on top of Protobuf to get at specific fields, keeping a symbol table of partial information for possible reuse, but it would be a lot harder to write. IOW if Protobuf space savings on the wire and in a DB is important, reading of in-place byte-array or string data from the encoding is possible but would take time to get working and tested and performant in the general case. Making that cross-platform would be even more of a chore.

arj03 commented 1 year ago

Yeah, I made a mistake in the calculation. I didn't include the signature, so the size is the same actually.

gpicron commented 1 year ago

I made 2 proposal to improve BIPF capabilities:

https://github.com/ssbc/bipf-spec/issues/1 https://github.com/ssbc/bipf-spec/issues/2

gpicron commented 1 year ago

I'm implementing a library in Nim for Bipf. I wanted to take it as opportunity to test the interoperability caps of Nim with NodeJS world. Nim is compilable both to JS and C in standard. Bipf is not a format that requires a lot of math, but mainly mem copies and branching. Still regarding the JS world there is the overhead of UTF8 <-> UTF16 conversion of string in all conversion...

I made 2 'bindings' of the library to compare it side-by-side with the reference implementation in JS. 1) I compiled to JS and export as a NodeJS module with a compatibilty layer to the ref implementation.

Benchmarks

Then I created a benchmark: Mainly 4 types of data is used:

ssbMessages sample of 16026 messages Estimation size of ssbMessages in JS :23.014.862 bytes Estimation size of ssbMessages in BIPF :12.811.882 bytes

Note: as you can see, the overhead of refs and strings make the SSB messages set occupy about twice the memory of the default BIPF equivalent.  (I write default, because I plan to implement a dictionary encoding extension for keys in Objects and also I don't convert Base64 strings to Buffers in that test)

In the benchmark:
- bipf is the JS reference impl
- nim_bipf is my lib compiled to JS
- nim_bipf_node is the NodeJs module (so the lib is compiled to C and bond to JS using NAPI api)
- JSON is making reference to built-in NodeJS JSON.parse/stringify
- (string) means working on the JS string version of JSON (so UTF16), while (buffer) means working on the NodeJS Buffer of this string (so UTF8)

 ### Encoding

Suite: Encoding data large data ✔ bipf#encode/large data 2,334 rps ✔ nim_bipf#serialize/large data 4,206 rps ✔ nim_bipf_node#serialize/large data 3,887 rps ✔ json#stringify/large data 6,237 rps

bipf#encode/large data (#) 0% (2,334 rps) (avg: 428μs) nim_bipf#serialize/large data +80.24% (4,206 rps) (avg: 237μs) nim_bipf_node#serialize/large data +66.56% (3,887 rps) (avg: 257μs) json#stringify/large data +167.28% (6,237 rps) (avg: 160μs)

Suite: Encoding data medium data ✔ bipf#encode/medium data 19,192 rps ✔ nim_bipf#serialize/medium data 33,031 rps ✔ nim_bipf_node#serialize/medium data 26,657 rps ✔ json#stringify/medium data 42,645 rps

bipf#encode/medium data (#) 0% (19,192 rps) (avg: 52μs) nim_bipf#serialize/medium data +72.11% (33,031 rps) (avg: 30μs) nim_bipf_node#serialize/medium data +38.9% (26,657 rps) (avg: 37μs) json#stringify/medium data +122.2% (42,645 rps) (avg: 23μs)

Suite: Encoding data small data (always same) ✔ bipf#encode/small data (always same) 43,571 rps ✔ nim_bipf#serialize/small data (always same) 78,355 rps ✔ nim_bipf_node#serialize/small data (always same) 58,570 rps ✔ json#stringify/small data (always same) 334,248 rps

bipf#encode/small data (always same) (#) 0% (43,571 rps) (avg: 22μs) nim_bipf#serialize/small data (always same) +79.83% (78,355 rps) (avg: 12μs) nim_bipf_node#serialize/small data (always same) +34.43% (58,570 rps) (avg: 17μs) json#stringify/small data (always same) +667.13% (334,248 rps) (avg: 2μs)

Suite: Encoding data ssb messages from arj,cel,andre ✔ bipf#encode/ssb messages from arj,cel,andre 68,717 rps ✔ nim_bipf#serialize/ssb messages from arj,cel,andre 113,992 rps ✔ nim_bipf_node#serialize/ssb messages from arj,cel,andre 75,182 rps ✔ json#stringify/ssb messages from arj,cel,andre 256,313 rps

bipf#encode/ssb messages from arj,cel,andre (#) 0% (68,717 rps) (avg: 14μs) nim_bipf#serialize/ssb messages from arj,cel,andre +65.89% (113,992 rps) (avg: 8μs) nim_bipf_node#serialize/ssb messages from arj,cel,andre +9.41% (75,182 rps) (avg: 13μs) json#stringify/ssb messages from arj,cel,andre +273% (256,313 rps) (avg: 3μs)

Observations:
- in the ref impl, 2 traversals of the object tree are done, while in the nim version, I make a single pass and stack accumulation.  That is the reason of the large improvement in nim version
- for the node module, given the overhead is larger on small objects than on large ones, it suggests that this is the call JS to module that make most of the overhead and not the calls in the modules to the NAPI api to traverse the object tree.
- JSON stringify is about 4 times faster on SSB messages than the ref impl and 2 times faster that the nim_bipf/serialize

### Decoding

Suite: Decoding data large data ✔ bipf#decode/large data 7,666 rps ✔ nim_bipf#deserialize/large data 8,369 rps ✔ nim_bipf_node#deserialize/large data 3,879 rps ✔ json#parse(string)/large data 7,568 rps ✔ json#parse(buffer)/large data 7,532 rps

bipf#decode/large data (#) 0% (7,666 rps) (avg: 130μs) nim_bipf#deserialize/large data +9.17% (8,369 rps) (avg: 119μs) nim_bipf_node#deserialize/large data -49.4% (3,879 rps) (avg: 257μs) json#parse(string)/large data -1.28% (7,568 rps) (avg: 132μs) json#parse(buffer)/large data -1.74% (7,532 rps) (avg: 132μs)

Suite: Decoding data medium data ✔ bipf#decode/medium data 56,087 rps ✔ nim_bipf#deserialize/medium data 59,784 rps ✔ nim_bipf_node#deserialize/medium data 28,034 rps ✔ json#parse(string)/medium data 52,374 rps ✔ json#parse(buffer)/medium data 49,368 rps

bipf#decode/medium data (#) 0% (56,087 rps) (avg: 17μs) nim_bipf#deserialize/medium data +6.59% (59,784 rps) (avg: 16μs) nim_bipf_node#deserialize/medium data -50.02% (28,034 rps) (avg: 35μs) json#parse(string)/medium data -6.62% (52,374 rps) (avg: 19μs) json#parse(buffer)/medium data -11.98% (49,368 rps) (avg: 20μs)

Suite: Decoding data small data (always same) ✔ bipf#decode/small data (always same) 115,147 rps ✔ nim_bipf#deserialize/small data (always same) 104,980 rps ✔ nim_bipf_node#deserialize/small data (always same) 83,148 rps ✔ json#parse(string)/small data (always same) 231,763 rps ✔ json#parse(buffer)/small data (always same) 241,729 rps

bipf#decode/small data (always same) (#) 0% (115,147 rps) (avg: 8μs) nim_bipf#deserialize/small data (always same) -8.83% (104,980 rps) (avg: 9μs) nim_bipf_node#deserialize/small data (always same) -27.79% (83,148 rps) (avg: 12μs) json#parse(string)/small data (always same) +101.28% (231,763 rps) (avg: 4μs) json#parse(buffer)/small data (always same) +109.93% (241,729 rps) (avg: 4μs)

Suite: Decoding data ssb messages from arj,cel,andre ✔ bipf#decode/ssb messages from arj,cel,andre 197,361 rps ✔ nim_bipf#deserialize/ssb messages from arj,cel,andre 217,829 rps ✔ nim_bipf_node#deserialize/ssb messages from arj,cel,andre 143,930 rps ✔ json#parse(string)/ssb messages from arj,cel,andre 253,209 rps ✔ json#parse(buffer)/ssb messages from arj,cel,andre 283,425 rps

bipf#decode/ssb messages from arj,cel,andre (#) 0% (197,361 rps) (avg: 5μs) nim_bipf#deserialize/ssb messages from arj,cel,andre +10.37% (217,829 rps) (avg: 4μs) nim_bipf_node#deserialize/ssb messages from arj,cel,andre -27.07% (143,930 rps) (avg: 6μs) json#parse(string)/ssb messages from arj,cel,andre +28.3% (253,209 rps) (avg: 3μs) json#parse(buffer)/ssb messages from arj,cel,andre +43.61% (283,425 rps) (avg: 3μs)

Observations:
- the nim_bipf is the same algorithm as the reference impl. While it performs much more runtimes checks, it is a little bit faster. I think it comes from a more agressive inlining in my code (it is simple in Nim to decide to inline some function by converting it in a template)
- The JSON parse on SSB messages is not so much faster than the bipf/decode and nim_bipf/deserialize which seems contradictory with the previous experience of @staltz .  I still wonder why.

### Transcoding
In practice, we have messages in JSON in a NodeJS Buffer and we convert it to BIPF.  The following benchmark is simulating that sequence.  The aim is to determine the interest to pass the JSON Buffer directly to a module to parse and transcode to bipf format.   

Suite: JSON 2 Bipf large data ✔ json#parse(string)/bipf#allocAndEncodelarge data 810 rps ✔ json#parse(buffer)/bipf#allocAndEncodelarge data 841 rps ✔ json#parse(string)/nim_bipf#serializelarge data 1,196 rps ✔ json#parse(buffer)/nim_bipf#serializelarge data 1,438 rps ✔ nim_bipf_node#parseJson2Bipf(string)large data 3,797 rps ✔ nim_bipf_node#parseJson2Bipf(buffer)large data 3,907 rps

json#parse(string)/bipf#allocAndEncodelarge data (#) 0% (810 rps) (avg: 1ms) json#parse(buffer)/bipf#allocAndEncodelarge data +3.87% (841 rps) (avg: 1ms) json#parse(string)/nim_bipf#serializelarge data +47.76% (1,196 rps) (avg: 835μs) json#parse(buffer)/nim_bipf#serializelarge data +77.66% (1,438 rps) (avg: 695μs) nim_bipf_node#parseJson2Bipf(string)large data +368.98% (3,797 rps) (avg: 263μs) nim_bipf_node#parseJson2Bipf(buffer)large data +382.5% (3,907 rps) (avg: 255μs)

Suite: JSON 2 Bipf medium data ✔ json#parse(string)/bipf#allocAndEncodemedium data 5,973 rps ✔ json#parse(buffer)/bipf#allocAndEncodemedium data 5,940 rps ✔ json#parse(string)/nim_bipf#serializemedium data 9,061 rps ✔ json#parse(buffer)/nim_bipf#serializemedium data 9,313 rps ✔ nim_bipf_node#parseJson2Bipf(string)medium data 21,600 rps ✔ nim_bipf_node#parseJson2Bipf(buffer)medium data 25,896 rps

json#parse(string)/bipf#allocAndEncodemedium data (#) 0% (5,973 rps) (avg: 167μs) json#parse(buffer)/bipf#allocAndEncodemedium data -0.55% (5,940 rps) (avg: 168μs) json#parse(string)/nim_bipf#serializemedium data +51.72% (9,061 rps) (avg: 110μs) json#parse(buffer)/nim_bipf#serializemedium data +55.94% (9,313 rps) (avg: 107μs) nim_bipf_node#parseJson2Bipf(string)medium data +261.65% (21,600 rps) (avg: 46μs) nim_bipf_node#parseJson2Bipf(buffer)medium data +333.57% (25,896 rps) (avg: 38μs)

Suite: JSON 2 Bipf small data (always same) ✔ json#parse(string)/bipf#allocAndEncodesmall data (always same) 38,431 rps ✔ json#parse(buffer)/bipf#allocAndEncodesmall data (always same) 39,756 rps ✔ json#parse(string)/nim_bipf#serializesmall data (always same) 50,714 rps ✔ json#parse(buffer)/nim_bipf#serializesmall data (always same) 55,768 rps ✔ nim_bipf_node#parseJson2Bipf(string)small data (always same) 104,622 rps ✔ nim_bipf_node#parseJson2Bipf(buffer)small data (always same) 120,522 rps

json#parse(string)/bipf#allocAndEncodesmall data (always same) (#) 0% (38,431 rps) (avg: 26μs) json#parse(buffer)/bipf#allocAndEncodesmall data (always same) +3.45% (39,756 rps) (avg: 25μs) json#parse(string)/nim_bipf#serializesmall data (always same) +31.96% (50,714 rps) (avg: 19μs) json#parse(buffer)/nim_bipf#serializesmall data (always same) +45.11% (55,768 rps) (avg: 17μs) nim_bipf_node#parseJson2Bipf(string)small data (always same) +172.23% (104,622 rps) (avg: 9μs) nim_bipf_node#parseJson2Bipf(buffer)small data (always same) +213.61% (120,522 rps) (avg: 8μs)

Suite: JSON 2 Bipf ssb messages from arj,cel,andre ✔ json#parse(string)/bipf#allocAndEncodessb messages from arj,cel,andre 51,876 rps ✔ json#parse(buffer)/bipf#allocAndEncodessb messages from arj,cel,andre 53,646 rps ✔ json#parse(string)/nim_bipf#serializessb messages from arj,cel,andre 72,350 rps ✔ json#parse(buffer)/nim_bipf#serializessb messages from arj,cel,andre 71,123 rps ✔ nim_bipf_node#parseJson2Bipf(string)ssb messages from arj,cel,andre 90,887 rps ✔ nim_bipf_node#parseJson2Bipf(buffer)ssb messages from arj,cel,andre 126,781 rps

json#parse(string)/bipf#allocAndEncodessb messages from arj,cel,andre (#) 0% (51,876 rps) (avg: 19μs) json#parse(buffer)/bipf#allocAndEncodessb messages from arj,cel,andre +3.41% (53,646 rps) (avg: 18μs) json#parse(string)/nim_bipf#serializessb messages from arj,cel,andre +39.47% (72,350 rps) (avg: 13μs) json#parse(buffer)/nim_bipf#serializessb messages from arj,cel,andre +37.1% (71,123 rps) (avg: 14μs) nim_bipf_node#parseJson2Bipf(string)ssb messages from arj,cel,andre +75.2% (90,887 rps) (avg: 11μs) nim_bipf_node#parseJson2Bipf(buffer)ssb messages from arj,cel,andre +144.39% (126,781 rps) (avg: 7μs)

Observations:
- Avoiding the JS string intermediate which implies UTF8 -> UTF16 -> UTF8 conversions is a must have in all cases.
- It becoming interesting to use a node js module when you cumulates transformations.

### Seeking path
Note: The nim_bipf implementation of seek path is very different than the reference implementation. I started the implementation of JSONPath compiler and runner.

These one is the similar as in the bench.js file of the reference impl seek for "devDependencies/faker" which exist in my package.json 

Suite: Seeking path ✔ bipf#seekPath(encoded) 862,583 rps ✔ bipf#seekPath(compiled) 4,339,012 rps ✔ nim_bipf#seekPath(encoded) 1,672,374 rps ✔ nim_bipf#seekPath(compiled) 4,710,096 rps ✔ nim_bipf_node#seekPath(compiled) 4,344,866 rps

bipf#seekPath(encoded) (#) 0% (862,583 rps) (avg: 1μs) bipf#seekPath(compiled) +403.03% (4,339,012 rps) (avg: 230ns) nim_bipf#seekPath(encoded) +93.88% (1,672,374 rps) (avg: 597ns) nim_bipf#seekPath(compiled) +446.05% (4,710,096 rps) (avg: 212ns) nim_bipf_node#seekPath(compiled) +403.7% (4,344,866 rps) (avg: 230ns)

Observations:
- bipf#seekPath(compiled) is generating JS function from the path and rely on JIT to make it efficient. And that works pretty well.  This is not something that I can/want to do in NIM
- nim_bipf#seekPath(compiled) use the compiled JSONPath, while nim_bipf#seekPath(encoded) compile it at each call.
- the nim_bipf impl while more complex (as I prepared it to make JSONPath filterings too) is more performant.  I think it is mainly due to the fact it does not use recursive calls.

### Scanning in memory db
This one is simulate a in memory DB and investigate the cost of keeping in in bipf format.  Remember that in default BIPF is occupies yet half the memory than as JS object.  I planned to further reduce the memory footprint via conversion of base64 to buffers and dictionary encoding for object keys.  Of course, scanning operations are probably faster in an array of JS Object and there is no decoding steps at all and the V8 is largely optimized for that use case.  But how much and of far it impact the user experience ?

Note: the test is doing any kind of sorting of the results (in practice such a in memory db array should be yet sorted by the most often used sort key in the app).  The algorithm is the same for all cases, seek the path in the message and if it exists compare the value to 'contact' (preencoded in BIPF cases)

Suite: Scanning in memory db (first 100 message of type 'contact') ✔ bipf#jsArray[js objects]/scan and match 32,045 rps ✔ bipf#jsArray[bipf]/seekPath(compiled) 969 rps ✔ nim_bipf#jsArray[bipf]/seekPath(compiled) 1,283 rps ✔ nim_bipf_node#jsArray[bipf]/seekPath(compiled) 327 rps ✔ nim_bipf_node#inModuleMemory 11,858 rps

bipf#jsArray[js objects]/scan and match (#) 0% (32,045 rps) (avg: 31μs) bipf#jsArray[bipf]/seekPath(compiled) -96.98% (969 rps) (avg: 1ms) nim_bipf#jsArray[bipf]/seekPath(compiled) -96% (1,283 rps) (avg: 779μs) nim_bipf_node#jsArray[bipf]/seekPath(compiled) -98.98% (327 rps) (avg: 3ms) nim_bipf_node#inModuleMemory -63% (11,858 rps) (avg: 84μs)



Observations:
- As expected, traversing the array of JS Object is clearly faster.
- Using the seekPath in the node module is not a good idea, too much calls overhead.
- Numbers seems not huge in all cases.  So I'm not sure that would impact so much the user experience.  I think the most important would be to improve the algorithm in a more "project multiple path and filter" to avoid multiple scan on complex filter and decoding only the needed data to display.
- nim_bipf_node#inModuleMemory is the most interesting situation to me.  The array is loaded in the node module memory before the test.  So actually the "in memory db" is in the node module and the search is a single call to the module.  And there the overhead of scanning the BIPF encoded message become very small...  
staltz commented 1 year ago

Extremely interesting results! Thanks for building it and reporting back.

  • The JSON parse on SSB messages is not so much faster than the bipf/decode and nim_bipf/deserialize which seems contradictory with the previous experience of @staltz . I still wonder why.

I suspect this is because I ran benchmarks on a 64MB-large log file with random simulated messages that come from https://github.com/ssbc/ssb-fixtures while you ran the benchmarks on real-world messages from only 3 people, and these people are kind of "biased" (active users). The majority of SSB messages are votes, which are small. See https://github.com/ssbc/ssb-fixtures/blob/main/src/frequencies.ts

But still seems reasonable to me that in your benchmarks you got JSON.parse ~1.5x faster, while I got ~2.5x faster. These numbers can swing quite a lot, maybe by a margin of 15% or so, depending on the computer and the data and the method.

gpicron commented 1 year ago

I generated a set of 80000 msg from 500 authors with the ssb-fixtures.

Suite: Decoding data ssb messages from arj,cel,andre
✔ bipf#decode/ssb messages from arj,cel,andre                         202,311 rps
✔ nim_bipf#deserialize/ssb messages from arj,cel,andre                208,494 rps
✔ nim_bipf_node#deserialize/ssb messages from arj,cel,andre           145,200 rps
✔ json#parse(string)/ssb messages from arj,cel,andre                  273,795 rps
✔ json#parse(buffer)/ssb messages from arj,cel,andre                  308,429 rps

   bipf#decode/ssb messages from arj,cel,andre (#)                      0%        (202,311 rps)   (avg: 4μs)
   nim_bipf#deserialize/ssb messages from arj,cel,andre             +3.06%        (208,494 rps)   (avg: 4μs)
   nim_bipf_node#deserialize/ssb messages from arj,cel,andre       -28.23%        (145,200 rps)   (avg: 6μs)
   json#parse(string)/ssb messages from arj,cel,andre              +35.33%        (273,795 rps)   (avg: 3μs)
   json#parse(buffer)/ssb messages from arj,cel,andre              +52.45%        (308,429 rps)   (avg: 3μs)
-----------------------------------------------------------------------

Suite: Decoding data ssb messages from ssb-fixture
✔ bipf#decode/ssb messages from ssb-fixture                         231,407 rps
✔ nim_bipf#deserialize/ssb messages from ssb-fixture                247,000 rps
✔ nim_bipf_node#deserialize/ssb messages from ssb-fixture           170,065 rps
✔ json#parse(string)/ssb messages from ssb-fixture                  285,518 rps
✔ json#parse(buffer)/ssb messages from ssb-fixture                  315,961 rps

   bipf#decode/ssb messages from ssb-fixture (#)                      0%        (231,407 rps)   (avg: 4μs)
   nim_bipf#deserialize/ssb messages from ssb-fixture             +6.74%        (247,000 rps)   (avg: 4μs)
   nim_bipf_node#deserialize/ssb messages from ssb-fixture       -26.51%        (170,065 rps)   (avg: 5μs)
   json#parse(string)/ssb messages from ssb-fixture              +23.38%        (285,518 rps)   (avg: 3μs)
   json#parse(buffer)/ssb messages from ssb-fixture              +36.54%        (315,961 rps)   (avg: 3μs)

Actually the difference between json#parse and nim_bipf#deserialize seems even smaller...

Same for encoding.

Suite: Encoding data ssb messages from arj,cel,andre
✔ bipf#encode/ssb messages from arj,cel,andre                        64,733 rps
✔ nim_bipf#serialize/ssb messages from arj,cel,andre                115,123 rps
✔ nim_bipf_node#serialize/ssb messages from arj,cel,andre            77,770 rps
✔ json#stringify/ssb messages from arj,cel,andre                    258,077 rps

   bipf#encode/ssb messages from arj,cel,andre (#)                    0%         (64,733 rps)   (avg: 15μs)
   nim_bipf#serialize/ssb messages from arj,cel,andre            +77.84%        (115,123 rps)   (avg: 8μs)
   nim_bipf_node#serialize/ssb messages from arj,cel,andre       +20.14%         (77,770 rps)   (avg: 12μs)
   json#stringify/ssb messages from arj,cel,andre               +298.68%        (258,077 rps)   (avg: 3μs)
-----------------------------------------------------------------------

Suite: Encoding data ssb messages from ssb-fixture
✔ bipf#encode/ssb messages from ssb-fixture                        75,542 rps
✔ nim_bipf#serialize/ssb messages from ssb-fixture                127,442 rps
✔ nim_bipf_node#serialize/ssb messages from ssb-fixture            84,779 rps
✔ json#stringify/ssb messages from ssb-fixture                    274,696 rps

   bipf#encode/ssb messages from ssb-fixture (#)                    0%         (75,542 rps)   (avg: 13μs)
   nim_bipf#serialize/ssb messages from ssb-fixture             +68.7%        (127,442 rps)   (avg: 7μs)
   nim_bipf_node#serialize/ssb messages from ssb-fixture       +12.23%         (84,779 rps)   (avg: 11μs)
   json#stringify/ssb messages from ssb-fixture               +263.63%        (274,696 rps)   (avg: 3μs)

And when using that fixture set in the "in memory" db simulation... The diff between array of JS Objects and array of BIPF in Node module memory is reducing too.

Suite: Scanning in memory db (first 100 message of type 'contact') (real users sample, 16000 messages)
✔ bipf#jsArray[js objects]/scan and match                   32,045 rps
✔ bipf#jsArray[bipf]/seekPath(compiled)                        969 rps
✔ nim_bipf#jsArray[bipf]/seekPath(compiled)                  1,283 rps
✔ nim_bipf_node#jsArray[bipf]/seekPath(compiled)               327 rps
✔ nim_bipf_node#inModuleMemory                              11,858 rps

   bipf#jsArray[js objects]/scan and match (#)               0%         (32,045 rps)   (avg: 31μs)
   bipf#jsArray[bipf]/seekPath(compiled)                -96.98%            (969 rps)   (avg: 1ms)
   nim_bipf#jsArray[bipf]/seekPath(compiled)               -96%          (1,283 rps)   (avg: 779μs)
   nim_bipf_node#jsArray[bipf]/seekPath(compiled)       -98.98%            (327 rps)   (avg: 3ms)
   nim_bipf_node#inModuleMemory                            -63%         (11,858 rps)   (avg: 84μs)
-----------------------------------------------------------------------
Suite: Scanning in memory db (first 100 message of type 'contact') (fixture generated, 80000 messages)
✔ bipf#jsArray[js objects]/scan and match                   25,472 rps
✔ bipf#jsArray[bipf]/seekPath(compiled)                      1,740 rps
✔ nim_bipf#jsArray[bipf]/seekPath(compiled)                  2,124 rps
✔ nim_bipf_node#jsArray[bipf]/seekPath(compiled)                58 rps
✔ nim_bipf_node#inModuleMemory                              17,436 rps

   bipf#jsArray[js objects]/scan and match (#)               0%         (25,472 rps)   (avg: 39μs)
   bipf#jsArray[bipf]/seekPath(compiled)                -93.17%          (1,740 rps)   (avg: 574μs)
   nim_bipf#jsArray[bipf]/seekPath(compiled)            -91.66%          (2,124 rps)   (avg: 470μs)
   nim_bipf_node#jsArray[bipf]/seekPath(compiled)       -99.77%             (58 rps)   (avg: 17ms)
   nim_bipf_node#inModuleMemory                         -31.55%         (17,436 rps)   (avg: 57μs)
-----------------------------------------------------------------------

Probably because there is much more small messages (vote) that are fast to encode/decode/scan.

So the diff with your benchmarks is not coming from the dataset.. May be the lib used to benchmark, I use "benchmarkify"

staltz commented 1 year ago

So the diff with your benchmarks is not coming from the dataset.. May be the lib used to benchmark, I use "benchmarkify"

It could be the measurement method. What I measured was startup time, which may include more things than just decoding.

gpicron commented 1 year ago

Looking at the code, if it was differing only on the deserialization part, you should not had more that 33% more time for the bipf... Moreover there is about 50% less to load from disk... And on a real feed log, startup time should be dominated by decryptions.
Maybe V8 is finding some internal optimization while jitting to shorlcut "IO then parse"...

Note with the Atom keys in https://github.com/ssbc/bipf-spec/pull/3#issuecomment-1488253855, I further reduced the gap in performance for deserialization vs JSON.parse https://github.com/ssbc/bipf-spec/pull/3#issuecomment-1488253855 to a point that is barely noticeable.

gpicron commented 1 year ago

Some additional experiences:

I compiled the same code in WASM using emscripten. Quite simple, much less complex than playing with napi for doing Node JS Module. And of course much more portable...

converting JSON to BIPF

Suite: JSON 2 Bipf ssb messages from arj,cel,andre

   json#parse(string)/bipf#allocAndEncodessb messages from arj,cel,andre                            -29,96%         (49 391 rps)   (avg: 20μs)
   json#parse(buffer)/bipf#allocAndEncodessb messages from arj,cel,andre                            -28,37%         (50 517 rps)   (avg: 19μs)
   json#parse(string)/nim_bipf#serializessb messages from arj,cel,andre                               -1,5%         (69 467 rps)   (avg: 14μs)
   json#parse(buffer)/nim_bipf#serializessb messages from arj,cel,andre (#)                              0%         (70 523 rps)   (avg: 14μs)
   nim_bipf_node#parseJson2Bipf(string)ssb messages from arj,cel,andre                              +11,01%         (78 288 rps)   (avg: 12μs)
   nim_bipf_node#parseJson2Bipf(buffer)ssb messages from arj,cel,andre                              +43,37%        (101 107 rps)   (avg: 9μs)
   nim_bipf_wasm#parseJsonToBipf(buffer)ssb messages from arj,cel,andre                             +37,53%         (96 990 rps)   (avg: 10μs)
   nim_bipf_wasm#parseJsonToBipf(buffer in WASM Shared Memory)ssb messages from arj,cel,andre       +46,33%        (103 195 rps)   (avg: 9μs)
-----------------------------------------------------------------------

Suite: JSON 2 Bipf ssb messages from ssb-fixture

   json#parse(string)/bipf#allocAndEncodessb messages from ssb-fixture                            -27,86%         (53 312 rps)   (avg: 18μs)
   json#parse(buffer)/bipf#allocAndEncodessb messages from ssb-fixture                            -22,29%         (57 428 rps)   (avg: 17μs)
   json#parse(string)/nim_bipf#serializessb messages from ssb-fixture                              +1,99%         (75 371 rps)   (avg: 13μs)
   json#parse(buffer)/nim_bipf#serializessb messages from ssb-fixture (#)                              0%         (73 899 rps)   (avg: 13μs)
   nim_bipf_node#parseJson2Bipf(string)ssb messages from ssb-fixture                              +12,86%         (83 406 rps)   (avg: 11μs)
   nim_bipf_node#parseJson2Bipf(buffer)ssb messages from ssb-fixture                              +54,68%        (114 310 rps)   (avg: 8μs)
   nim_bipf_wasm#parseJsonToBipf(buffer)ssb messages from ssb-fixture                             +39,79%        (103 301 rps)   (avg: 9μs)
   nim_bipf_wasm#parseJsonToBipf(buffer in WASM Shared Memory)ssb messages from ssb-fixture       +30,89%         (96 727 rps)   (avg: 10μs)
-----------------------------------------------------------------------

Observations:

Scanning in memory DB (simple query : search first 100 contacts message )

Given the size reduction to encode in BIPF, looking if it is viable solution vs array JS Objects. Nothing optimized.

Suite: Scanning in memory db (first 100 message of type 'contact')
   bipf#jsArray[js objects]/scan and match (#)               0%         (24 313 rps)   (avg: 41μs)
   bipf#jsArray[bipf]/seekPath(compiled)                -94,11%          (1 431 rps)   (avg: 698μs)
   nim_bipf#jsArray[bipf]/seekPath(compiled)            -92,09%          (1 923 rps)   (avg: 519μs)
   nim_bipf_node#jsArray[bipf]/seekPath(compiled)       -88,63%          (2 765 rps)   (avg: 361μs)
   nim_bipf_node#inModuleMemory                         -35,89%         (15 586 rps)   (avg: 64μs)
   nim_bipf_wasm#inModuleMemory                         -54,78%         (10 995 rps)   (avg: 90μs)

Observations:

staltz commented 1 year ago

I compiled the same code in WASM using emscripten. Quite simple, much less complex than playing with napi for doing Node JS Module.

Interesting! I didn't know this about WASM.

Scanning in memory DB (simple query : search first 100 contacts message )

bipf#jsArray[js objects]/scan and match

Could you clarify what this means? Is it a js array of js objects or is there BIPF involved?

By the way, I tried to store a JSON array of JSON objects, and it's incredibly slow to JSON.parse such a large array. Instead, I use async-append-only-log which is a log of length-prefixed records (bytes of any encoding you want) and then each of those records is encoded as JSON. Scanning this length-prefix log and doing a JSON.parse on each of the records is very fast.

Right now I don't have a use for BIPF, and JSON is well enough. BIPF was always about "in-place reads", which isn't necessary if we can fully load-and-parse all records in memory in a few seconds.

gpicron commented 1 year ago

Could you clarify what this means? Is it a js array of js objects or is there BIPF involved?

This is a simple javascript array containing the 80000 messages parsed in JS objects. It is of course the fastest to scan in JS (at least for very simple query like the one tested) but at the cost of higher memory footprint.

Keeping the data in a serialized format in memory reduce a lot the footprint but at the cost of increased time to locate data in scans. Reduced footprint means more messages in mem. And using serialized format allow for shorter loading time in your case (actually you can even make mmap which implied that the OS will take care of keeping in mem is enough resource)

I'm now looking at making a extension for sqlite3 for BIPF support similar to the sqlite3 json extension. That can offer the benefit of BIPF with the query engine of SQLite and still portable to browser, node (using emscripten) and whatever that can bind a C api.