OP_RETURN 80-byte limit restored, enable prunable outputs

CraigSellars commented 9 years ago

Referencing https://github.com/bitcoin/bitcoin/pull/5286

Several of the larger Omni transaction types using bare multisig can now be fit into OP_RETURN @ 80-bytes, which (while possibly slower to confirm due to mining pools refusing to mine them) is preferable to current methods in terms of transaction encoding and size. This is still preferable even if the cost is slightly higher than bare multisig for the obvious (data storage) reasons.

In order to be better stewards of blockchain transactions, this Issue is for discussion on which transactions can and should be shifted to leverage 80-byte OP_RETURN when it is available in Bitcoin Core.

dexX7 commented 9 years ago

This is still preferable even if the cost is slightly higher than bare multisig for the obvious (data storage) reasons.

transaction size vs payload size

80 byte OP_RETURN is cheaper in terms of transaction size (= fee) and superior to the current mutlisig encoding, especially because dust is not redeemed in one step at the moment. A more efficient approach would look like this. I created the chart based on the assumptions that all multisig outputs are respent in one step and change also becomes part of a multisig package. A very wild guess is that the size related to data packages can be cut by 1/2 on average.

It would make sense to use the old encoding scheme as fallback for transactions that don't fit into 80 byte though.

I'm wondering the most about potential implications related to obfuscation, and it might sound crazy, but I suggest to mirror Counterparty, which uses ARC4, initialized with the transaction id of the first input. It's fast and can be applied on variable length payloads, whereby the current obfuscation fits well into the bare multisig model, but would require significant tweaking for something else such as OP_RETURN.

zathras-crypto commented 9 years ago

I'll dig out my old Class C stuff and take another look - IIRC SP creation would be the only message we couldn't fit in 80 bytes currently.

Regarding Counterparty - their method of looking for the CNTRPRTY bytes at the beginning of an OP_RETURN output requires analysis of every transaction - this sort of folds into the discussion on losing the Exodus address (ie do we want to provide future lightweight clients with an easy ability to retrieve a list of all Omni transactions by watching Exodus). Most concerning to me though (and I have absolutely no desire to rip on any others peoples/projects work here, just sharing a legitimate concern) is that I've been told counterpartyd can take as long as 200 seconds to parse one block (where as conversely we will process 200 blocks in one second) - I don't know if this is the encoding, python, lack of optimization or an inherent limitation with their encoding methodology but it's critical to me that we maintain our high performance and do not have our integrators criticizing us for bringing their servers to a crawl and such other commentary that's been publicly levied at other projects.

To be clear, that's not to say we shouldn't go OP_RETURN - we absolutely should - I'm just saying that before we go 'borrowing' from any other projects, we need to ensure we don't borrow problems too.

Regarding obfuscation entirely though, my same point stands as always - obfuscation is entirely pointless when we hang an Exodus marker on the front door.

Thanks Z

dexX7 commented 9 years ago

their method of looking for the CNTRPRTY

I was solely referring to the obfuscation, getting rid of Exodus is a different topic, but it could actually be combined.

I've been told counterpartyd can take as long as 200 seconds to parse one block

Haha that sounds crazy, but I wouldn't be surprised. I recently scanned the whole chain, including deobfuscating XCP transactions for identification for another one of those charts and it took roughly 15 minutes on a rather loaded VPS, with a native Bitcoin Core based build though.

Using ARC4 was just an idea, another one could be used as well of course. I believe the trickiest part is to break down parseTransaction. The encoding on the other hand is basically almost ready, as it was one of the things I tested some weeks ago.

zathras-crypto commented 9 years ago

Haha that sounds crazy, but I wouldn't be surprised.

Yeah I only mention it as there has been a lot of vocal recently around performance of the various softwares for the 2.0 projects and the perception of our project (for a change haha) is we have a good stack so I'm just keen to keep that momentum. The XCP guys probably simply suffer from the same architectural issues as we did in the early days. MSC-tools, MasterChest, Bitoy's stuff - they all had the same fundamental flaw - they required an external bitcoin daemon and had to pull every single transaction over RPC to evaluate it - a sloooow process. The way to go was always to get inside bitcoin core so we could get at everything directly (I said so at the start even when chest was a competitor) but I just didn't have the skills to do it solo so @m21 was a godsend in my book haha.

EDIT: sorry not quite true - msc-tools used obelisk IIRC

I was solely referring to the obfuscation, getting rid of Exodus is a different topic, but it could actually be combined.

I do see them as part of the same discussion, simply because I don't see the value to obfuscation unless we dispose of the Exodus marker.

Using ARC4 was just an idea, another one could be used as well of course. I believe the trickiest part is to break down parseTransaction. The encoding on the other hand is basically almost ready, as it was one of the things I tested some weeks ago.

This actually also feeds into a discussion I was having with Craig on pending txs and the work needed to do it - there are a bunch of things that making parsing more atomic (encoding/tx class changes being one of them) would help with :)

[3/02/2015 5:18:10 PM] zathrasc: tl:dr; it's:
stage 1 - seperate block handling and parsing to make parsing atomic
stage 2 - seperate processing from parsing to allow parsing without changing state
stage 3 - provide a model to parse txs in mempool
stage 4 - provide a analytics engine to create a map of all txs in mempool and ascertain risk on pending (eg valid: false, all the way through to valid: true (99% certainty) we should never do a 100% for pending

dexX7 commented 9 years ago

I do see them as part of the same discussion, simply because I don't see the value to obfuscation unless we dispose of the Exodus marker.

Very, very good point. The only argument I can think of against it would be to deploy it in smaller steps, e.g. obfuscation earlier, then removing the marker.

... pending txs and the work needed to do it

Speaking of ..: https://github.com/dexX7/bitcoin/compare/dexX7:291715c6bbe285200f019c6b8c6467b87b79c74b...experiment-plain-data

foundation.omni.test.rpc.misc.DecodePacketSpec.html foundation.omni.test.rpc.misc.GetPayloadSpec.html

Check the "standard output" for some samples. It decodes transactions without verification, whether confirmed or not.

Not in the test specs, but you probably see the RPC calls: it's easy to create basically any transaction, or let's rather say: create the payload which can then be used with sendrawtx_MP. Pretty experimental - it works, but I'm not satisfied with those transaction templates and it's basically testing things..

rawraw

Sorry, off topic! :)

Bitoy commented 9 years ago

For op return 80 (class c) clear text then apply some compression techniques. (Ala zip compression) No obsfucation needed since it's allowed by Bitcoin core.

Instead of the exodus marker, would it be faster if the marker for class c is the dust value? Ie tx with dust with 5888 satoshi is probable class c omni tx.

Good idea to separate block handling and parsing.

Parse incoming omni transactions ( with 0 confirmations). Once a block is confirmed, post the omni transactions of the confirmed block.

zathras-crypto commented 9 years ago

Speaking of ..: dexX7@dexX7:291715c6bbe285200f019c6b8c6467b87b79c74b...experiment-plain-data

Wow! Had no idea you'd got this far - I was literally just today raising technical debt again and that we had a bunch of TX types we couldn't yet broadcast from Core and I disagreed (you know me - the ever persistent PITA haha) we should be expending resources writing more python scripts to create transactions instead of just putting the functionality straight into Core - seems like you will be able to clear some of those broadcast debt items very quickly </huge grin>

CraigSellars commented 9 years ago

Speaking of ..: dexX7@dexX7:291715c6bbe285200f019c6b8c6467b87b79c74b...experiment-plain-data

Agreed. Nicely done thus far.

I also agree with doing obfuscation and un-exodus as separate steps.

I’m not familiar with the ARC4 method, but one of the concerns Zathras raised (in conversation and above) would be transaction parsing and limiting the need to scan every single transaction. I’m curious where the best middle ground would be to move in the Class C direction but not sacrifice parsing speed.

dexX7 commented 9 years ago

For op return 80 (class c) clear text then apply some compression techniques.

I love you! :) This is indeed the alternative. According to Shannon's source coding theorem there is a direct relation between compression and entropy, basically saying maximal compressed message can't be more random. I have tested a few schemes, inspired by https://github.com/mastercoin-MSC/spec/issues/273, but no final conclusion, but there is indeed some space to safe for non-text parts. In general there are two options:

Work on a field by field basis, potential candidates: LEB128, Elias gamma encoding
Work on the whole payload, potential candidate: ?

No obsfucation needed since it's allowed by Bitcoin core.

This is not the point. It's that pools can or can't censor Omni transactions. Eligius is one to name who does it.

I’m not familiar with the ARC4 method, but one of the concerns Zathras raised

I think your mixing things here. This is completely unrelated to obfuscation: not ARC4 is the reason to deobfuscate every transaction, but XCP's marker is within the obfuscated packet, so you need to deobfuscate every transaction, then check, if it's a XCP transaction.

Wow! Had no idea you'd got this far

I have several private repos and like 100+ branches. >_< (most of it being duplicates though)

we should be expending resources writing more python scripts

I don't understand?

seems like you will be able to clear some of those broadcast debt items very quickly

I'm going to continue on the branch above, getting closer to the serialization model of Bitcoin Core, but then it could need someone with better skills for the final touch, I think. As first step of integration it would probably be only used for transaction construction, and you already have done a lot of work with gettransaction_MP which works fine, so I'm not sure, if it would make sense to use it there, except maybe for unconfirmed transactions.

To continue on OP-RETURN, I envision basically the following layered route:

Construct raw Omni transaction
Obfuscate via whatever method
Construct Bitcoin transaction
Add marker
Add change
Broadcast

And the other way around:

Get Bitcoin transaction
Check for marker
Extract sender
Deobfuscate
Extract reference
Go on with interpretation

Now the construction to broadcast is not that difficult, but parseTransaction is a mega huge chuck which currently mixes class A, class B and it also triggers some effects (Exodus purchase, DEx payment). In the end, each step should be one "module", so to speak, so we can easily switch obfuscation, or the transport layer (e.g. bare multisig vs. OP_RETURN), or the way how the sender is identified.

Something to think about: right now, if a transaction has a multisig output, then it's processed as class B, but what if...

It has an OP_RETURN output, but is not a valid class C transaction
It has a bare multisig output, but is not a valid class B transaction
...

I actually suggest to check for class C, then fall back to class B, then fall back to class A. Does this make sense?

Bitoy commented 9 years ago

Check the Huffman coding for compressing raw omni tx. http://en.m.wikipedia.org/wiki/Huffman_coding

Compressing raw omni tx may allow long omni tx (more than 80 bytes) to use return op and maybe lower miners fee.

I agree on check for class c, fall back to class b, then class a.

dexX7 commented 9 years ago

@Bitoy: do you have any test data or results?

Compressing raw omni tx may allow long omni tx (more than 80 bytes) to use return op and maybe lower miners fee.

A "text" field can have up to 255 characters, so there is unfortunally no way to stay below 80 bytes in every case.

zathras-crypto commented 9 years ago

Ideally I'd like to see us use a dynamic engine - one that creates the plaintext packet, compresses it (using whatever algo of choice) and looks at the size - if >80 bytes send via Class B else send Class C.

The alternative is obviously to simply tell Core (hard code) to encode tx types n to m via Class C and types x to y via Class B which is far simpler, but would result in some txs (example creations) that could be sent Class C to be sent Class B.

I do think OP_RETURN with 80 bytes is going to be our Class C, and I see much more value in compression across OP_RETURN (where the packet length can be variable), where as I saw very little value across multisig (because our packet length was always gonna be 30 bytes each regardless of compression so they could be masked into pubkeys)

However, just to play another angle here - it's going to be N amount of time before these transactions can be mined at all (it's merged to master but has not been included in any releases, and many pools set their own policy regardless) and N+M amount of time before 80 byte OP_RETURN transactions can be mined expediently enough to become our default class. As such I think we should have a plan, and flesh out a design, but I don't think we should charge head first into it, there is plenty of other stuff in need of attention :)

Something to think about: right now, if a transaction has a multisig output, then it's processed as class B, but what if...

Yep that's accurate - I think because we only allow outputs for change in Class A (as opposed to Class B where we permit other outputs in general) so if we see a multisig we already know it's not a valid Class A and can just look at in the context of Class B. I can see your point re Class C however, and if we permit other output types there we will have to have an order of precedence defined for decoding attempts (eg what if you create a transaction with both a valid Class C OP_RETURN packet and a valid Class B multisig packet with conflicting instructions and so on).

Thanks Z

Bitoy commented 9 years ago

If the user creates a very long text it will definitely not fit in ret_op 80 char. Zathras suggestion of a dynamic engine is the solution.

Some compression numbers (please verify if correct) using Elias with Hex Level compression

Simple Send Standard Bits 128 Compressed Bits 58 Compressed 45% of Original

Crowdsale ("Companies","Bitcoin Mining","Quantum Miner","tinyurl.com/kwejgoig") Standard Bits 752 Compressed Bits 778 Compressed 103% of Original

Crowdsale ("Companies","Bitcoin Mining","Quantum Miner") Standard Bits 592 Compressed Bits 538 (68 bytes) Compressed 91% of Original

Crowdsale ("Quantum Miner") Standard Bits 408 Compressed Bits 284 Compressed 70% of Original

dexX7 commented 9 years ago

Ideally I'd like to see us use a dynamic engine - one that creates the plaintext packet, compresses it (using whatever algo of choice) and looks at the size - if >80 bytes send via Class B else send Class C.

Yes, this is what I had in mind as well. Completely decouple encoding from transport layer.

I need to correct my statement above. Elias Gamma encoding works on a field-by-field basis as well. It can also be called "variable length integer encoding". The idea behind it: to send an amount of 100 willets, we still need the whole 64 bit, even though the number 100 could be expressed by 7 bit. The issue is that it's not known "when a number ends and when the next begins", but there are different approaches, though no perfect solution, because it depends on the numbers that are being processed. One works better, if the numbers to compress are expected to be smaller, others work better, if the numbers are expected to be larger, etc.

Another approach would be to compress the whole data stream, instead of individual fields. Would need to run some benchmarks on a larger number of test data with different techniques.

@Bitoy: to my understanding, but I haven't went into the rabbit hole yet, Huffman encoding requires symbol tables or probabilities known beforehand. For example, when applying it on English text, the characters "e", "t", "a" are much more often used than "x", "q", "z", so they get shorter symbols, which are used as substitute. Since we have no small alphabet, I'm not sure, if it's usable in this context. There is also the option to prepend a symbol table at the beginning of each message, but my limited testing showed the introduced overhead usually results in larger transactions than smaller ones.

Bitoy commented 9 years ago

@dexX7 yes Huffman table requires a table and therefore not workable.

How about hex level compression using the Elias gamma encoding http://en.wikipedia.org/wiki/Elias_gamma_coding

Ex. Simple Send Hex tx: 00000000000000010000000005f5e100 There are only 16 possible value 0 to 9, a to f

Elias Encoded bits 1111111111111110101111111110011000001000000110000111101011 (58)

Standard Bits 128 Compressed Bits 58 Compressed 45% of Original

zathras-crypto commented 9 years ago

Speaking of ..: dexX7@dexX7:291715c6bbe285200f019c6b8c6467b87b79c74b...experiment-plain-data

@dexX7 I've been evaluating what you did here - love it - am going to be using a lot of this :)

zathras-crypto commented 9 years ago

Hmm this is where my GitHub-fu shows weakness - want to take a good chunk of your work here (but not quite all) but want to ensure commit history shows it was mostly your contribution.

zathras-crypto commented 9 years ago

Also when looking at your tx https://blockchain.info/tx/099c2bc61e9d45e7746e2eb63f6fb5400de7622a88230f5d402c29f275718db5 this wouldn't be accepted by OmniCore (the multisig inputs would trigger an invalidation)

dexX7 commented 9 years ago

[I] am going to be using a lot of this

I have a few variations of this, including some improvements, and it would probably the easiest, if you go ahead and I'll contribute. What parts are you looking for?

Also when looking at your tx

That's not mine. Chancecoin just adopted some ideas and uses http://api.bitwatch.co/ (basically my address indexed branch).

mastercoin-MSC / mastercore

OP_RETURN 80-byte limit restored, enable prunable outputs #283