vbakke / trytes

Converting between trytes and bytes
6 stars 1 forks source link

Multi-tryte codecs #8

Open todofixthis opened 6 years ago

todofixthis commented 6 years ago

One idea that's been batted around is creating "multi-trit" codecs, where a binary value can be represented using a variable number of trits (in the same way that UTF-8 may use 1, 2, 3 or 4 bytes to represent a particular character).

Here's an example, put forward (as a thought experiment) by @paulhandy:

I commented here iotaledger/iota.lib.js#130 regarding a unicode encoding scheme, in which I suggested that perhaps the first trit of a sequence could give the length. For example, we could have [0+-] defined as [end,3,6] trits such that any byte up to 3^4 could be represented in 4 trits, with a maximum of 7 trits. I'm not sure this is best or anything, but something like this could be considered.

I feel like this might be an interesting approach to the "square peg / round hole" problem.

vbakke commented 6 years ago

I've been playing a bit more with the numbers now. I still stand by my comment on B) in #9. To minimize the "waste", we need to convert a hole stream to a huge number before making bits out of the trits.

My current implementation is based on asciiToTrytes.js, which encodes an 8 bit byte, into a 6 trit tryte. In other word, a 0-255 value, into a 0-728 value (written as to tryte3 chars).

A 100 char ANSI message, end up as 200 tryte chars.

 

The best fit I've found (and is also a small and handy one), is fitting every 3 bit (0-7) into 2 trits (0-8). (Or 6 bits into 4 trits, as @paulshandy mentioned above.)

Then a 100 char ANSI message will end up as 134 tryte chars. (Plus some padding or length codes suggested.)

 

We still need two (x2) methods, though. A) Grouping 5 trits into 8 bits (and back), i.e. trytes encoded to bytes B) Grouping 3 bits into 2 trits (and back), i.e. bytes encoded to trytes

(I've tried several options, but have not yet found a square peg, that fits just perfectly to that round hole. ; )

vbakke commented 6 years ago

Okay, @todofixthis and @paulhandy, I've uploaded a web version of the tryte encoding to https://vbakke.github.io/trytes, for encoding both Unicode text to trytes; and proper tryte strings to bytes.

For the text to tryte, I hav included two versions:

Supprisingly, it does not save much, when it comes to the length of the tryte string. A little bit, when the text becomes longer. But not as much as expected. Not sure if it is worth compressing, really.... :-/

todofixthis commented 6 years ago

Awesome! I'm excited to check this out!

In my opinion, we don't have to get it perfect on the first try, as long as what we end up with is useful and reasonably efficient — I expect that multiple codecs will emerge for byte <-> tryte conversions, in the same way that developers can use UTF-16, UTF-8, ISO-8859-1, etc. for character <-> byte conversions (in fact PyOTA is counting on it).

In particular, I don't think we have to get the square peg to fit the round hole exactly — UTF-8 and UTF-16 both have byte sequences that are impossible to decode, for instance, and they are exceptionally useful/successful regardless.

vbakke commented 6 years ago

Hi @chrisdukakis.

I'd like to invite you to this discussion about trytes and bytes, since you are looking at the asciiToTrytes.js,and in case you will start looking into other issues converting bytes and trytes. :)

A few months back, I did some work with inputs from a few other people, trying to battle unicode texts to trytes. Some people wanted to send more that just English A-Z in the message part. I wanted to encrypt an IOTA seed.

I wrote a discussion at https://github.com/vbakke/trytes and a test page at https://vbakke.github.io/trytes/.

If you have any inputs, it would be most appreciated. :)

Cheers, vbakke