vbakke / trytes

Converting between trytes and bytes
6 stars 1 forks source link

Why convert ASCII to Trytes? #6

Open sketch34 opened 6 years ago

sketch34 commented 6 years ago

I'm trying to understand why the asciiToTrytes functions exist. Please don't interpret this as critical, I'm just a programmer interested in understanding the tech :)

//      RESULT:
//        The ASCII char "Z" is represented as "IC" in trytes.

I want some help understanding this decision. In the above code, all that is happening is a remapping to a different encoding than ASCII. Essentially its just encoding ASCII into an arbitrarily chosen tryte alphabet (which is also ASCII), but it has the effect of doubling the amount of memory needed to store "Z" or indeed any char. To me it appears like this ASCII <-> tryte conversion step simply results in a more memory hungry representation that is also in ASCII. Why is this done?

I believe the decision to use trytes came from a desire to anticipate future ternary hardware. But if the hardware natively supports ternary / trytes / trits, then we still don't need the asciiToTrytes functions right? ASCII chars would be natively represented as trytes by the underlying compiler / interpreter / JIT, this doesn't happen at the software level, it is dictated by the hardware. On binary hardware all you can do is choose some arbitrary representation of a tryte. But why are we even doing this when it will happen automatically when ternary hardware comes along? You can't force a byte to be a tryte. It feels like we're inventing a problem that doesn't need solving and making things less efficient in the process?

As a final thought, the best way I can think of to achieve this representation in JavaScript is using DataView / ArrayBuffer interfaces to encode trytes across byte boundaries. E.g. Represent the tryte sequence as a sequence of bits of arbitrary length using bitwise ops. This way you will waste a maximum of 7 bits at the end of the sequence. Obviously this would still be more CPU heavy than simply using the original ASCII encoding. Plus you're opened up to endianess issues across platforms / languages. If it's data compression we're after there are far better ways to achieve this.

vbakke commented 6 years ago

I'm not part of the IOTA team, so I cannot answer why they chose trinary over binary.

But now that they did, we have to accept that all IOTA transaction data are trinary, that is seeds, addresses, tags and messages.

Since they also chose to represent trinary values with A-Z, we can "write" messages using trinary values. (The same way you could be writing messages with A-F, in a hex notation.)

However, if you want to write any other non-english letter, you cannot use the IOTA tryte notation.

That is when you need to encode binary data as trytes. And yes, by doing this, you do waste a lot of bits/trits. Unfortunately.

todofixthis commented 6 years ago

For more information about why IOTA uses balanced ternary instead of binary, see this Stack Exchange post.

As for why we bother to encode ASCII data into trytes in the first place, when those trytes are themselves represented using ASCII characters anyway....

Think of it like Base64 for trits (or perhaps Base58 is more apt here). There are more efficient ways to transmit/store bits, but Base64 is among the more "user friendly": the result contains only printable characters, making it safe for contexts where users will need to copy/paste values, where the receiving software only understands ASCII-encoded strings, etc.

Likewise, the so-called trytesToAscii method will probably persist because of how easy it is for humans to grok the resulting bytes. But, for use cases where "user friendliness" is not a concern, more space-efficient codecs will be developed instead.