[Investigate] Use output id instead of `utxo_pos`

boolafish commented 5 years ago

Background

We had some previous discussion about the idea of transaction input directly pointing to some output id instead of using utxo_pos.

This would give us the benefit to chain transactions easier and solve one problem of using snark for settlement.

Propose Solution

Concat transaction hash and output index to be the output id. The transaction hash should be generated without witness(proof) data. (edit: thanks for Pepesza, this is actually same as bitcoin's segwit)

[edit] a proposal of the tx hash construction:

small tx: hash(hash(inputs), hash(outputs))
large tx: hash(merkle_proof(inputs), merkle_proof(outputs)). For large tx, we might not be able to generate hash on chain, so use merkle proof instead.

Some previous thread:

boolafish commented 5 years ago

[Moving thread from slack for visibility]

pepesza [5:16 AM] @boolafish I just understood that we were doing (getStandardExitId thing) is basically segwit (as in Bitcoin's SegWit). Stable transaction IDs allow us to chain to point to utxos that were not assigned the position. We just need to double check if this thing does not break anything.

pepesza [5:18 AM] We applied it first to exit IDs. Idea to drop utxopos was dismissed because of priority queue depening on utxopos. But utxopos can be derived from inclusion proofs and we are doing those anyway. This generalizes chaining in situations where mvp/MoreVP allow for chain. Awesome! So signatures or predicate related "data" field all should be just called 'witness'

paulperegud commented 5 years ago

We have two options on the table: 1) Using output id instead of utxo_pos just for trade settlement chaining. 2) Using it everywhere.

Output id (computed as a hash of tx concatenated with oindex) relies on probability to avoid collisions. Thus it will occupy more space in transactions and in call data.

Will it break things? @pdobacz pointed out that this might break mass exit mechanisms relying on bitmaps. So we might potentially close for ourselves some venues of further development.

boolafish commented 5 years ago

mass exit mechanisms relying on bitmaps.

do you mind to elaborate this? @paulperegud

boolafish commented 5 years ago

btw, personally I would vote for either stick with utxo_pos for everything or move everything to output-id. The impact of this is globally IMHO and not really locally to the tx type only. For instance, we might rely on this to decide an output is finalized/exited or not (??)

boolafish commented 5 years ago

[Note] In the migration plan meeting, in able to minimize the breaking changes first, we decide to move forward with keeping utxo_pos for existing tx types. We can have a later decision on whether we want to upgrade all txs to same schema for the sake of simplicity afterward.

boolafish commented 5 years ago

[Note discussion with Pepesza] For the chaining, it works fine without output_id expect:

POA --> POS. Then it might be hard to find the "operator" giving you utxo_pos in advance.
zk-snark. Assuming proof generation takes time, this would make chaining with utxo_pos hard as you need to catch up the block finalization speed, and must provide proof to operator whenever you need the utxo_pos to chain.

boolafish commented 5 years ago

Closing this as it seems like it works. For implementation detail we can always have new issue when DEX POC hits in.

boolafish commented 5 years ago

[NOTE] In the meeting with @Pongch and @paulperegud @Nikodemek18 , It seems actually more simple to go with dropping the utxo_pos directly. As a result, we change our direction to just drop utxo_pos.

Pongch commented 5 years ago

Keep in mind, this decision is not final yet, we will have to go through some technical costing discussion to see which way is easier to implement from the network side 😄

slavamirovsky commented 5 years ago

Yo, @boolafish @Pongch I think we should decide on one way of communication. We discussed the same in the Confluence, here and over Slack :)))))

boolafish commented 5 years ago

@Pongch curious what are the step / required info to make the decision here? Contract implementation is touching this now actually.

paulperegud commented 5 years ago

So, its using output_id everywhere vs keeping output_id for payments only. I'm pretty much happy with just using output_id everywhere. Potential savings are pretty small, considering how much space signatures and addresses are taking in tx.

boolafish commented 5 years ago

[Note on meeting with @pdobacz @Pongch @achiurizo ] We would need to answer whether moving to output_id would close the door of mass exit. My intuition is No as utxo_pos still exists and our exit still rely on that to get priority. Need more clear thinking and give response back though.

edit: our meeting notes link

boolafish commented 5 years ago

I am not sure which "mass exit" mechanism we are talking but I just finish the "start standard exit" implementation and it does not need output_id to start at all. Assuming what "mass exit" does is a compact way mass "standard exit", I believe changing the input to output_id or keeping utxo_pos does not impact.

pdobacz commented 5 years ago

I am not sure which "mass exit" mechanism we are talking

Sorry if I haven't been specific enough. It is about the coordinated mass exit mechanisms that were hinted at in the plasma whitepaper.

BTW c.f. above:

mass exit mechanisms relying on bitmaps

But as I've delved into those descriptions, I'm not entirely sure I got that part right.

pdobacz commented 5 years ago

OK, lemme revive the thread and give some recap from my PoV and some suggestions to move forward.

tl;dr I think the impact of output_id on elixir-omg/contracts/protocol will be contained, if we choose option 3/ below. However, I'd like to challenge this option, in case something serious is missed here

I see 4 options now: (3 actually, option 0/ is not an option)

0/ fix only utxo_pos forever (not really considered, but is what we have now) 1/ fix only output_id forever 2/ allow utxo_pos and output_id, but without input_pointer_type 1:1 binding (there can be 2 distinct txs representing same state transition) 3/ allow utxo_pos and output_id, with the input_pointer_type 1:1 binding (output type predetermines way such output can be referenced/spent)

(for background on the 1:1 binding, see end of comment)

Also note that utxo_pos might just be an example of a "future", non-output_id input pointer type. The choice between 1/-3/ is relevant, when one would think of a_new_input_pointer_type_we_invent_in_the_future in lieu of utxo_pos.

Problems with said approaches:

(only output_id):
- -- utxo_pos is ~5 times cheaper
- -- (tentative) utxo_pos conveys information about age/depositness of the UTXO referenced. As such is slightly more useful, and may be useful for utxo-age-based protocols like account exit
(choice but without 1:1)
- -- harder to reason about tx identity/distinctness (e.g. when proving competitors, input double spends)
- -- much harder to detect DoubleSpends in the Watcher
(choice with 1:1)
- -- (Andy) can only have one “implicit output type” (Currently coded as output type 0 as the preserved output type for payment)

Also NOTE that we can also choose option 1/ for now, but say that 1:1 binding makes sense, in case we ever introduce new input pointer types.

More on the 1:1 binding

(notes from Slack)

Q: Can we/should we have a 1:1 relationship between the input_pointer_type (so output_id vs utxo_pos) and the output_type?

The relationship reads: for any output type, there's only one way one can "reference" it in an input to tx (that being what "input pointer type" is).

A: I think it is an okay assumption to be added. It remove the possibility of one output type being referenced by two ways, but we can easily have two output types for that cases anyway. The down side I can think of is that, as we can only have one “implicit output type” (Currently coded as output type 0 as the preserved output type for payment), if one day we would like to use outputId for payment, it would need to be different output type and need to use outputGuard to hash the output type.

Q: why does the 1:1 binding matter?

If you can reference a particular output in more than one way, the following problem surfaces: you can have bitwise distinct tx1 and `tx1*`` that are actually the same transaction (because they're referencing the same output in different ways). This heavily complects the way transaction equality is calculated (e.g. for competitor challenges), the way how we detect double spends and how we manage the state of the ledger.

boolafish commented 5 years ago

I think 1) and 3) are the practical options on the table. Do you think we can make decision on this soon @pdobacz or what data do we need to make a decision? as contract code are becoming a bit diverging as different implementers take different default path to go with (lol) I think this starts to become a bigger problem now.

boolafish commented 5 years ago

Note from a slack discussion with @pnowosie, will also be stated in this PR: https://github.com/omisego/plasma-contracts/pull/212#event-2555964506.

Seems like the goal is eventually moving toward outputId as input pointer. So the question for me would be what's the plan mind to move toward outputId? One thing need to take care is the audit process so we would need decision on what mechanism we're choosing to do in this round of audit and what flexibility we need to keep for future use.

pdobacz commented 5 years ago

The path that I'm intending to take with elixir-omg is to have the choice of the input pointer as abstracted out as possible.

If path 3/ is chosen, then this seems possible from my PoV.

Similarly, if 3/ is chosen, modifications on elixir-omg shouldn't be too big.

What I'd advocate is to:

pick path 3/
keep the particular implementation of the input_pointer abstracted out as much as possible on contract end too
postpone the migration to output_id till after we ensure the correct behavior of the refactored contracts and integrate them to elixir-omg

boolafish commented 5 years ago

okay, given this, I think a practical goal to contract implementation would be our Payment V1 for soon audit would be using utxoPos while we want to make sure it can be upgrade to Payment V2 which is potentially using outputId instead of utxoPos.

boolafish commented 5 years ago

closing this issue as we are clear what our aim is. we take the assumption of 1:1 which is one output type has one single way of pointing to it. However, we want to gradually move toward using outputId everywhere for simplicity.

omgnetwork / research