Obfuscate UTXO-Based outputs inside transfer proofs

inaltoasinistra commented 6 years ago

The UTXO serialization protects the privacy of the payee since the payer (and others if/when the proof is published) does not know the receiving utxos, because they are hashed.

By the way it is easy and cheap to discover hidden utxos. It is possible to build a rainbow table that maps serialized utxos to plain utxos. The cost is O(blockchain size). It is possible to keep a synchronized rainbow table to resolve all the RgbOutPoint in constant time.

To prevent this behaviour we can add a key to the serialized data to build RgbOutPoints: HMAC_SHA256(KEY, TX_HASH || OUTPUT_INDEX_AS_U32)

RgbOptPoints are hashed as before but they require keys in order to be decoded. The keys will be included into spending transactions.

Without the key an RgbOptPoint will not provide any information about the receiving UTXO.

afilini commented 6 years ago

I like the idea of obfuscating the outputs, and I have a couple of questions for you (not rhetorical questions, even if they sound kind of aggressive. I'm really trying to understand this):

Do you think that deriving the key from the data you are actually trying to hide is really effective?
Why do we need to list the keys if the future verifiers could just brute-force the j and check if any of the HMACs matches the output spent on the blockchain
If a future verifier can really brute-force the j, is this system effective in any way?
If you cannot brute-force the j, then the recovery process from seed is different (more complex) from what you described, right?

inaltoasinistra commented 6 years ago

Questions are welcome. They may evidence gaps in my explanation.

Do you think that deriving the key from the data you are actually trying to hide is really effective?

Every BIP32 derivation depends from the seed of the wallet, it provides security, not the derivation path. The path only deterministically binds the seed to the generated secret. BIP32 hardened derivation guarantees that the discosure of keys does not reveal anything else about the wallet. I used utxos as BIP32 uses integer to enumerate keys.

Why do we need to list the keys if the future verifiers could just brute-force the j and check if any of the HMACs matches the output spent on the blockchain

If a future verifier can really brute-force the j, is this system effective in any way?

Because the wallet seed is required to generate keys, and so also to brute force j.

afilini commented 6 years ago

Okay, forget all of what I wrote before: I don't know why but initially I thought that your idea was to use the UTXO as seed for the derivation, and this is why I was wondering what advantages would we get by using like HMAC(F(UTXO), UTXO) instead of just HASH(UTXO).

Now, this makes sense, but I still don't really like the idea of adding 32 bytes for each output. Do you think we could find a way to add only a fixed number of bytes (which would be just a random nonce) and then deterministically derivate the keys from this nonce, so that only once an RGB output is spent everyone can verify the proof? (kind of like in your example: only once an output is spent and the corresponding keys are included in the proof, then everyone can verify it)

inaltoasinistra commented 6 years ago

I haven't found a way to construct keys with a nonce included into the transfer proof yet.

In order to verify a proof we need for every spent output:

the utxo;
the key;
the RgbOutPoint.

Since we must generate keys when we receive tokens we don't know the keys subset we will use in every future transaction. If we use a nonce to generate all the keys we expose them after the first tx, and we lose the privacy we are trying to provide with this system. We must expose only the keys we are actually using in the tx, or keys we'll never use. We also cannot expose used keys, because this would link different transactions of the same wallet together. So we should have a different nonces that generate every subset of keys. I cannot find a way to derive the same key from different nonces.

I elaborated a strategy to reduce the number of nonces (|nonces| ≤ |keys|), more keys inside a tx increase the probability to aggregate some of them. A smart coin selection could improve the performance.

The wallet generates a finite set of keys using a binary tree HD derivation. Example:

        k11     a
      /
   k1 – k12     b
 /    
k
 \
   k2 – k21     c
      \
        k'22    d

kxy represents the derivation m/some-constant'/x'/y' Every k can generate all the subtree of which it is the root. Examples of keys (a, b, c, d) subsets and the sets of nonces (k) that generate them:

Keys	Nonces
{a, b}	{k1}
{a, b, c}	{k1, k21}
{a, b, c, d}	{k}
{a, d}	{k11, k22}

Do you think something like that would be worth it? This would add complexity in keys management. Which problems do you see in listing all the keys inside proofs?

afilini commented 6 years ago

The only problem I see is the size of the proof, I don't want it to grow too much.

I'll think about your tree idea, I think it's a good starting point!

afilini commented 6 years ago

Okay, I might have an idea but I have to make a big disclaimer before: I don't know anything about elliptic curve operations/cryptography, so this might be completely broken (and yes, in this case you all are allowed to have fun of me), but I think it's still a good idea to write it down and get a feedback.

So, here it is: it's kind of similar to your first idea of using an HMAC, but instead I use two elliptic curve operations, a sum and a multiplication: if you take (UTXO + Factor) * G, the multiplication by G is kind of an hash function, and the Factor you add is like the key in an HMAC: it adds some "randomness", so that rainbow tables cannot be calculated.

What's different is that with elliptic curve points you should (I guess) be able to aggregate them, so that only one "blob" of data ends up in the proof instead of the list of keys as you proposed.

Basically the first part is completely copied from your idea: for each UTXO you have, you generate a corresponding key using the derivation path you described. This key is what we are going to use as Factor.

Then, when you want to get paid you send (H(UTXO) + key) * G to whoever is going to pay you: building a rainbow table of H(UTXO) * G would be easy, but the key in there breaks everything.

Then in your proof you add one single value, which is the sum of all the keys used.

So, to recap: in the previous proofs you have (H(U_1) + K_1) * G, (H(U_2) + K_2) * G, ... (let's call them "addresses", or A_n, since they are kind of an address) and in this proof you are creating you add the sum K_1 + K_2 + ..., which I'm going to call S. Note that a verifier also knows all the UTXOs U_1, U_2, ... once you create a proof, because you are spending them. Oh, and obviously G is also known.

What a verifier has to check now is that (H(U_1) + H(U_2) + ...) * G + S * G = A_1 + A_2 + ....

If you substitute all the variables in there you get:

(H(U_1) + H(U_2))G + SG = A_1 + A_2
(H(U_1) + H(U_2))G + SG = (H(U_1) + K_1)G + (H(U_2) + K_2)G
(H(U_1) + H(U_2))G + SG = H(U_1)G + K_1G + H(U_2G) + K_2G
(H(U_1) + H(U_2))G + SG = (H(U_1) + H(U_2))G + (K_1 + K_2)G
(H(U_1) + H(U_2))G + SG = (H(U_1) + H(U_2))G + SG

Honestly, as I said before, I don't know if this is strong enough, but I feel like calculating a "fake" S so that you can spend proofs that were not originally linked to one of your UTXO is pretty hard.

Does this make any sense?

StefanoPellegrini commented 5 years ago

I am sorry for entering this discussion which I didn't completely understand. My feeling is, for what I know about elliptic curves, that the strength of such a construction deeply depends on the elliptic curve used, but there should be at least one that suits your purpose. Elliptic curves are of so much interest for making inverse calculations hard and since the operation involved in your construction are not that strange there should be an elliptic curve that behaves the way you want, obviously if the set of possible values for S is large enough. On the other hand I know nothing about rainbow tables so I could be completely wrong.

dr-orlovsky commented 5 years ago

Why not just to use EC math – or even HMAC – with the Factor being the actual public key, corresponding to the UTXO address? It is not known before the actual UTXO is spend — and than it is public, so no reason to add it to the proof and no size increase.

dr-orlovsky commented 5 years ago

This PR will close the issue #80

dr-orlovsky commented 5 years ago

After the team meeting at @InBitcoin it was decided to abandon this change, since a) it does provide very limited temporally privacy trading on a lot of complexity b) if the privacy is the concern, its better to adapt some best practices from zero-knowledge proofs, like in confidential assets.

rgb-archive / spec

Obfuscate UTXO-Based outputs inside transfer proofs #59