Research non-JSON encodings

JeremyRand commented 8 years ago

JSON may be suboptimal in terms of encoding efficiency. It would be useful to research whether msgpack, compression, or msgpack with compression would be a better fit. This research might be informed by evaluating real-world usage in the blockchain, both in terms of space savings and in terms of decoding cost, and taking into account both desktop and mobile clients.

hlandau commented 8 years ago

There's also CBOR, which as I understand it is like MsgPack but is probably preferred from a standards perspective.

JeremyRand commented 8 years ago

Indeed, CBOR does looks quite nice, probably a bit better than MsgPack.

indolering commented 8 years ago

This has been covered before in the past and there is an extensive wiki article on the subject. MsgPack only saved ~35% (~25-50KB) over generic compression techniques.

Whatever encoding technique is used, we should compare it to gzip, bzip2, or brotli.

JeremyRand commented 8 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

On 06/15/2016 12:21 PM, Zach Lym wrote:

This has been covered before in the past and there is an extensive wiki article on the subject. MsgPack only saved ~35% (~25-50KB) over generic compression techniques.

Whatever encoding technique is used, we should compare it to gzip, bzip2, or brotli.

I had never seen that wiki article, but at first glance it mostly looks like unsourced bullshit. (Yet another reason why scrapping the wiki in favor of Jekyll was a good move.)

If you're alleging that non-JSON encoding constitutes premature optimization, it would be great if you could provide a source of where I recommended doing it anytime soon. In fact, if you look at my dehydrated certificate PR, you'll note that I specifically say there that I don't want to optimize too much at the moment; I even cite non-JSON encoding as a reason for this. This ticket is primarily here so that (1) we don't forget about it later, and (2) anyone who's interested in doing this work is aware of the proposal.

That's not to say that I oppose doing research on this; I just don't plan on spending much of my time on it right now. Others are more than welcome to hack around with it on whatever schedule they like. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iQIcBAEBCAAGBQJXYeNeAAoJEAHN/EbZ1y06dacP/iBchkTME4cJzAYvs+nZ5ZZ7 ZiVP7w9W2rWEzVksRRshDo4dwq4xTG+raM/FsZW5LGitJKuycRVRlTyDFN7J4EYq lehnLnvAJlsam05xCyLmH6WAMAWxVCXIXwfTDVsw7nJg4gjw9npUB5HDry0R3CDq wTcW3X5OuqwSY0aRruGJknhZO0Nf9FKQ8mMORa+uVDTb2V4K2HYsbnYNSqSmw0uT +V7QhKDZwST+A3BaammMTpH0oDkiGC1PDkyXIwoE9B4yq6bEiktByamz14sh9wv+ ffn0790r9MnVU91kPj14O086K9imOWzeQQLH82Qkrupvzuwl7zMaqzEbyh9QGlPm AiXmceXCu4NGdNmAbC2PExSUxCP7+LYf9Nx52V6Hc20OEWf66xjvVqFhEvSd4aQ/ tHOIFHpOIXyggb4F5nT07+JeT9KYQFVKrmcYMe+3/eYznB6XZRIOAenYN//KJ7iS IXTskd1e7Tfp4s/+ieIh3s7YYi9tzJvJgDUjiS5dwvMcFIzudjTWCIcYXbb93Vpl Hgs6VaBUqpJM6S9ZtAyDQvaVdmGjALQwlxJO6056Ei2iF+BuFkmWmOeWhpPljF+l tPdizgkz26lI5pQOzeneI4ozLdLla2johwE5IyyOaNrAJQ12oVphmja5O0X/L5o/ i/sDrAXVx9ywtSF5Hgm3 =1V+K -----END PGP SIGNATURE-----

indolering commented 8 years ago

Mostly unsourced bullshit? So Ryan and I sitting around playing with compression on real-world samples is unsourced bullshit?

JeremyRand commented 8 years ago

@indolering It is completely impossible to reproduce the results claimed in that wiki article. In fact, the results claimed are so vague that it is difficult to even tell what they mean, let alone how they could be reproduced. From the minimal amount of data that is claimed, it is clear to me that the data is highly suspect in accuracy. (I would give examples of why the data is probably inaccurate, but I have more productive things to be doing with my time than debunking claims that have made zero effort to attain credibility.) Also, the authorship of those claims is not disclosed; grepping for "indolering", "zach", or "ryan" on that page yields zero results. The page looks to a casual observer like an official Namecoin statement rather than anything that you or Ryan unilaterally claimed. This is not the first time that I've found low-quality wiki pages with misleading authorship statements (or lack thereof), and unfortunately I'm usually the one who ends up with the job of cleaning that shit off the wiki when people report it to me. Jekyll forces things to go through some basic level of sanity review, which is why the wiki is being scrapped in favor of Jekyll.

So yes, given the above, it's pretty clear that that wiki page qualifies as "mostly unsourced bullshit" and that it's a reason why the wiki is being scrapped.

I am unwilling to hijack this GitHub thread any further. The Namecoin scene has been very drama-free for many months (I've been enjoying the productivity greatly, as I think others have), and I have zero interest in participating in arguments here. If you want to contribute reproducible data about non-JSON encodings in this thread, you're welcome to do so. Anything else is off-topic; please take it somewhere else.

JeremyRand commented 8 years ago

Back on topic: it's probably fairly difficult (or perhaps impossible?) to get reliable data to test with, given that the current level of Namecoin usage is very different from the usage level where compression actually matters, and that both of those usage levels differ substantially from publicly available data from ICANN. I'd be open to hearing ideas on getting useful test data, but I have a feeling that if we try to tackle this anytime soon, we're going to have to use artificial data (i.e. bullshit data). That's possibly a good reason to avoid worrying about this topic anytime soon. It might be more productive to spend time on other scalability things, such as SegVal.

namecoin / proposals

Research non-JSON encodings #22