WIP: modularization, minimal encoding

dexX7 commented 9 years ago

Most of this branch are concept tests and experiments, but since @zathras-crypto is already on class C encoding, it's probably better to discuss the general direction, and in particular here the aspect of data embedding and encoding.

The main goal I had in mind was greater modularization of Omni Core, maybe even in parts which go beyond Omni Core as project, but tackle on-chain data embedding in general. For the future I envision Omni Core as a rich package of core and utility functions, where some parts might be converted into a library, which could then be used in other projects like OmniJ, OmniPy, or whatever..

So back to the actual topic: the essence I'd like to focus on now is an early stage of the data embedding layer:

experimental-minimal-encoding/src/extensions/core/encoding.cpp

I consider and handle data as a bunch of bytes, not strictly as data packages, or something that is embedded via class B or class C encoding. Data (generally) could be embedded in bare multisig outputs, via op-return, ... and data could furthermore be transformed and modified, whether that is class B obfuscation, or by inserting sequence numbers.

There are basically three parts, starting with the actual embedding of plain data:

Helpers used for bare multisig encoding:

As well as some Omni Core specific functions related to the current data embedding schemes:

In like 5 lines of code EncodeBareMultisigObfuscated() (class B) was created, simply by combining the insertion of sequence numbers, obfuscation and bare multisig data embedding.

To demonstrate this in action, there are three RPC calls:

experimental-minimal-encoding/src/extensions/rpc/encoding.cpp

rpc_encoding

Keeping the dependencies low is a goal, so Bitcoin Core may ultimately only be used for convenience, and of course as data provider and network connector, when considering the greater picture beyond data embedding.

Right now I'm looking for early feedback about the general idea, the long term goal of modularization, and in particular an experimental stage of the data encoding aspect.

zathras-crypto commented 9 years ago

First off I think there is a lot of great stuff here :)

Modularization - truth be told not having a long background in SW dev (being an infra guy) I don't really know what benefits come with modularization. I assume code portability to other projects is a main focus which is definitely an admirable prospect. From my own perspective I guess I've been prioritizing a stable reference daemon and haven't really been thinking outside that proverbial box.

I've got some basic Class C parsing going on (how do you feel about the plain data instead of obfuscated/compressed/encoded?) and am going to move onto encoding now and it's been a while since I looked at that code, but I do see using some of what you've done here along with the encoding stuff you did for the various tx types - how that'll all fit together I'll make some suggestions over the coming days as I look at it further.

TL:DR; I like the ideas you have for data encoding, making the various stages more atomic would give us greater flexibility over tx creation etc - have you given much consideration into how these extensions would be implemented against the current codebase?

dexX7 commented 9 years ago

I'm mostly experimenting at the moment, and this is probably going to be restructured again, but I wanted to get a feeling for the sentiment. Thanks! :)

... what benefits come with modularization. I assume code portability to other projects is a main focus which is definitely an admirable prospect.

This is a nice bonus, but I see the real value in:

more atomic would give us greater flexibility

Maybe I tend to overthink this too much, given that the encoding (or other parts) shouldn't be changed all that often etc., but right now it looks pretty complicated to do so, because there are quite a few things mixed together. Another aspect is risk: not that I doubt the current class C implementation or alike, but it's given, even if it's minimal, that modifying consensus critical code may introduce unexpected behavior. Greater modularity likely reduces the scope or impact of changes.

I'm not sure, if it's crazy to suggest this, but what I was even thinking about: in addition to the checks in isAllowedOutputType(), where OP_RETURN is enabled at some height, the whole parseTransaction(), or other parts, could be switched based on height, e.g.:

int parseTransaction(bool bRPConly, const CTransaction &wtx, int nBlock, ...)
{
    if (nBlock < nOpReturnEnableBlock) {
        return parseTransactionOld(bRPConly, wtx, nBlock, ...);
    } else {
        return parseTransactionNew(bRPConly, wtx, nBlock, ...);
    }
}

And once it's live, and it turned out that no consensus bug was introduced accidently (in the time between release and actual switchover), the old one could be fully deprechiated.

have you given much consideration into how these extensions would be implemented against the current codebase

It depends, at this stage it's more experimenting than actual integration, but I'd think that it would feasible to replace parts - or parts of - mastercore::ClassB_send() for example.

how do you feel about the plain data instead of obfuscated/compressed/encoded?

Let me say it this way: I'd love to have fully obfuscated, compressed transactions, with class fallback mechanisms and more, but you're totally right that all those things are actually seperated topics, which can be introduced step by step. And in general I think releases should be done more often, instead of holding back something, until something else is done, so the current direction is perfectly fine in my opinion. :)

From all this, I'd consider a class C to class B fallback as most important, as this feels like a significant restriction. Ideally, but this is also sort of a different topic, there would also be a fallback for DEx payments, mostly because it's possible to have more than one DEx payment (even to different receivers) within one transaction, but only if it's a Master transaction, but one which is not class B (or C). Consider the case where one might send a DEx payment, and attach a message for the seller or some other meta data.

zathras-crypto commented 9 years ago

...From all this, I'd consider a class C to class B fallback as most important, as this feels like a significant restriction... ...Consider the case where one might send a DEx payment, and attach a message for the seller or some other meta data...

I did consider this but I couldn't think of a way to do it in a simplistic fashion - parseTransaction is only responsible for retrieving the packet, and it's a dumb process (it doesn't care about the content).

Currently if there is a OP_RETURN output we only know we have a data packet (whether that be Omni or some extra metadata the sender has attached), and it's not until later on in interpretPacket that we'll know whether it's valid Omni stuff and at that point going back to call parseTransaction again as Class B is a break from usual logic flow so would need to tread very carefully.

I appreciate advanced use cases could perhaps merge Omni encoding and their own addition of extra metadata but I guess my view there goes to at this stage, what resourcing should we expend to support these potential advanced use cases? - if the answer was "an hour" lol I'd be all for it, but class fallback in transaction parsing seemed like a chunk of changes would be needed or alternatively a marker in the OP_RETURN output to mark it as Omni data which if you use enough bytes to avoid potential collisions (let's say 8) we've just cut 10% off our available packet space - neither seemed particularly viable. Basically tl:dr; on this one - I'm open to suggestions, just don't have a (good enough) method for it :)

zathras-crypto commented 9 years ago

Also do you think this https://github.com/zathras-crypto/mastercore/commit/802ea702330f9f503f16027d0e9147e5c841a2c7 should be submitted upstream?

dexX7 commented 9 years ago

... and it's not until later on in interpretPacket that we'll know whether it's valid Omni stuff and at that point going back to call parseTransaction again as Class B is a break from usual logic flow so would need to tread very carefully.

Aahh, now I see what you were referring to in the other comment. I think it's a mistake on my part due to the words I used: I'm actually not referring to a fallback in that sense, but more to dual-use or multi-purpose transactions, where a transaction could be a class C transaction and a DEx payment. If it turns out it's not a class C transaction, it seems seperated from the DEx payment.

what resourcing should we expend to support these potential advanced use cases?

Yes, agreed. I mentioned it in the context of "this and that would be nice to have", given that obfuscation is a much more complex topic.

But since we're on the general progress: would you mind to post some quick notes about a transition to Bitcoin Core 0.10? Around 33 % of users, according to https://getaddr.bitnodes.io/dashboard/?days=90, switched already.

zathras-crypto commented 9 years ago

but more to dual-use or multi-purpose transactions, where a transaction could be a class C transaction and a DEx payment

Ahh I see - I'd been going down a different path for that in my head - I'd like us be able to (with the addition of something as simple as a separator) daisy chain many messages together in the payload. I think addresses can be encoded within the payload too to support many actions in the same transaction. On my nice to have list too :)

And FYI quick and dirty encoding is done here - https://github.com/zathras-crypto/mastercore/commit/d0e05518de567b1f60ca4efcba6ac6e3ebea7414

It's actually very simple to clone ClassB_send and adapt it for Class C. At the moment since we don't send any large packets I've simply flipped send_INTERNAL_1packet to use ClassC_send instead of ClassB_send while testing to force all my outbound tx's go Class C. A very simplistic implementation could just use payload size to pipe the tx into ClassC_send/ClassB_send accordingly (though ClassB_send still needs work for bigger payloads anyway).

Anywho just a starting point via path of least resistance, can decode and encode Class C without much work it seems. Will obviously need work to solidify (checks and such) but seems good news :)

dexX7 commented 9 years ago

I'd like us be able to (with the addition of something as simple as a separator) daisy chain many messages together in the payload. ... I think addresses can be encoded within the payload too to support many actions in the same transaction.

Are you kidding? Haha, I always had the impression you hated the idea of "virtual" references. This is very good news! This is what my very first issue on the spec was about:

Without the limitation of one "reference field" per Mastercoin transaction, it could be possible to chain transactions within one transaction. This allows in particular multiple sends without creating new transaction types. [...] Some say address reuse should be avoided, so one could attach another send of the remaining balance to the next wallet. [...] [Use a] seperator [or] terminator at the end of a chain.

:)

And FYI quick and dirty encoding is done here - zathras-crypto@d0e0551

Wow, even if this might be quick and dirty, this basically means class C is almost done?

zathras-crypto commented 9 years ago

Are you kidding?

Nope - I think it was you who swayed me IIRC lol - a lot of it had to do with my misunderstanding the bytes required to send the address - once you pointed out we could send the HASH160 in the packet instead of the address directly, the extra 'weight' on the payload from including a reference went down substantially :)

Wow, even if this might be quick and dirty, this basically means class C is almost done?

In a scrappy, first draft fashion, sure. If you clone my 0.0.9.2-Z-ClassC branch and build - that will send Class C and then parse them when they confirm (ping a couple of sends to yourself to test). As I say I still have to add things like validation and checks and clean up, but it seems simple enough :)

dexX7 commented 9 years ago

payload

This just as a side note and in the general context of class A, B and C encoding. It's actually interesting, because there is almost no difference on a script level between pay-to-pubkey-hash, pay-to-script-hash, pay-to-bare-multisig, op-return, [...](edit: oh, and not to forget redeeming-script-hashes) in that context, because what they all have in common: it's simply a "push" operation of "data", whether it's a hash, pubkey or nulldata.

The fun starts when you take a look at scriptPubKey.mscore_parse() (or in a more expressive form: https://github.com/dexX7/bitcoin/commit/770749d7d6833c0abca85ff349514deedd27ea00), which does nothing else than "collecting" the "data" that was pushed. There is quite some work involved in evaluating output types and whatsoever, which almost seems unnecessary, given that "payload" basically equals "pushed data", unrelated to the actual output type. But say we'd simply collect all pushes, then the more tricky part would be to make sense of this "bunch of data", namely determining which data is actually part of the payload, what might be considered as "reference", or how all this can be combined into a model which also involves "bitcoin transfers".

Where I'm getting: it looks like there is an incredible difference between class A, B or C encoding, actual "bitcoin addresses", the concept of "virtual references" or other constructs, but all this is much, much more similar and related than it might appear to be. :)

Edit: wrong button.

mastercoin-MSC / mastercore

WIP: modularization, minimal encoding #302