monero-project / monero

Monero: the secure, private, untraceable cryptocurrency
https://getmonero.org
Other
9.03k stars 3.12k forks source link

[Discussion] Consider removing the tx_extra field #6668

Open tevador opened 4 years ago

tevador commented 4 years ago

First discussed here: https://github.com/monero-project/meta/issues/356

We should consider removing the tx_extra field from all non-coinbase transactions.

Main reasons:

  1. Enhanced fungibility due to a more uniform transaction format.
  2. Protection from the risks of arbitrary data on the blockchain, e.g. copyrighted material, privacy violations, politically sensitive or illegal content etc.

Required data that is currently stored in tx_extra (e.g. transaction public key) could be moved to a dedicated field.

Miner (coinbase) transactions could still allow the tx_extra field for the following reasons:

Disadvantages of removing the tx_extra field:

fluffypony commented 4 years ago

I support this. I'm perpetually worried about someone wanting to pack stuff into tx_extra unnecessarily.

ghost commented 4 years ago

Are you suggesting giving mining pools the privilege to store arbitrary data into the blockchain?

sumogr commented 4 years ago

afaik there is a proposal (an open pr) already from mooo to make use of extra to encrypt recipient's private data into it which is quite interesting https://github.com/monero-project/monero/pull/6410

tevador commented 4 years ago

Are you suggesting giving mining pools the privilege to store arbitrary data into the blockchain?

No. Currently anyone can store arbitrary data on the blockchain. We could either completely remove this option or keep it just for miners (e.g. for merged mining).

afaik there is a proposal (an open pr) already from mooo to make use of extra to encrypt recipient's private data into it which is quite interesting #6410

The problem is that the consensus mechanism cannot force the data in tx_extra to be encrypted. A malicious sender can still include arbitrary data and claim that it's the ciphertext. Additionally, even the mere presence of encrypted data is a distinguishing factor for transactions (unless all transactions include data of the same length).

sumogr commented 4 years ago

The problem is that the consensus mechanism cannot force the data in tx_extra to be encrypted. A malicious sender can still include arbitrary data and claim that it's the ciphertext. Additionally, even the mere presence of encrypted data is a distinguishing factor for transactions (unless all transactions include data of the same length).

There's supposed to be an off-chain agreement-trust between sender and receiver already, for this to be working, i agree with that, that's the point (assuming i understood mooo's intentions correctly). Padding of arbitrary data up to a pre-specified size was already suggested for uniformity, indeed.

tevador commented 4 years ago

There's supposed to be an off-chain agreement-trust between sender and receiver

A malicious sender doesn't need an agreement with anyone. As an extreme example, imagine a KYC exchange started sending the recipient's name and amount in tx_extra.

sumogr commented 4 years ago

There's supposed to be an off-chain agreement-trust between sender and receiver

A malicious sender doesn't need an agreement with anyone. As an extreme example, imagine a KYC exchange started sending the recipient's name and amount in tx_extra.

there i agree but i already can encode in the tx extra my name or yours, if i knew it, into any tx. Anyway it will be an interesting discussion, lets see

ghost commented 4 years ago

@tevador

We could either completely remove this option

I like this.

hyc commented 4 years ago
* losing the ability to soft-fork the transaction format

This bothers me a bit, but really, when have we soft-forked anything? So far all changes have been hard forks.

SomaticFanatic commented 4 years ago

Also support removing tx_extra. Increasing fungibility is always a good thing. Is this a holdover from Bitcoins way of doing things? What, do people imagine, was the motivation for including it in the first place?

tevador commented 4 years ago

This proposal by @UkoeHB is also relevant: https://github.com/monero-project/monero/issues/6456

It suggests removing all non-optional parts of a transaction from the tx_extra field.

UkoeHB commented 4 years ago

My argument against removing the extra field completely (copy pasted from #6456):

The Monero core team cannot see the future nor evaluate all possible usecases of Monero. To a large extent, it is up to users how Monero actually gets used. If there is a feature which only a subset of Monero users find valuable, it requires adding data to transactions, and the core team either isn't interested or does not have the resources to implement it, then the only way that feature can exist without a fork is with something like the extra field. Moreover, if for some reason periodic hard forks become no longer feasible, then without an extra field the Monero transaction structure will be frozen for eternity. Just as Monero is changing today, who knows how it will change in the future. An extra field permits changes that don't depend on hard forks.

tevador commented 4 years ago

If there is a feature which only a subset of Monero users find valuable, it requires adding data to transactions, and the core team either isn't interested or does not have the resources to implement it, then the only way that feature can exist without a fork is with something like the extra field

If such feature is only used by a subset of transactions, it will affect the privacy of everyone using Monero. In theory, there could be dozens of these extensions in the future, which could be enough to tag users based on the specific set of extensions they use. Do we want this?

UkoeHB commented 4 years ago

If such feature is only used by a subset of transactions, it will affect the privacy of everyone using Monero. In theory, there could be dozens of these extensions in the future, which could be enough to tag users based on the specific set of extensions they use. Do we want this?

The biggest problem is if a feature clearly improves the Monero user experience in some way, but for a reason we don't know about today a hardfork isn't possible, then that feature can't be implemented without the extra field. It's painful from a privacy perspective, but I feel we shouldn't underestimate the danger of backing ourselves into a corner by mistake. Imo the extra field is an insurance policy that acknowledges our fallibility as protocol designers.

fluffypony commented 4 years ago

You don’t need tx_extra for that, you can use the range proofs for data storage and all sorts. If we’re going to go down the road of ossifying our tx format then the current one is woefully unsuitable, with or without tx_extra.

moneromooo-monero commented 4 years ago

That's not a good argument, since it's relying on extra removal not preventing the thing that was sought to be prevented in the first place.

ghost commented 4 years ago

you can use the range proofs for data storage and all sorts

Can you explain what you mean or give some reference?

UkoeHB commented 4 years ago

That's not a good argument, since it's relying on extra removal not preventing the thing that was sought to be prevented in the first place.

@moneromooo-monero can you clarify this statement?

moneromooo-monero commented 4 years ago

"You don’t need tx_extra for [embedding extra data for future use]" implies removing extra will not prevent people putting custom data in a tx, which was the intent of the issue. The comment was used in support of extra removal though, so it relies on the intent of the issue being made moot.

Gingeropolous commented 4 years ago

I favor removing tx_extra for tx uniformity. A potential hybrid approach that would allow opt-in tx_data is a secondary chain/database that is linked to the main monero chain.

Basically, you have the tx, and then you have a data packet that sticks onto the tx by referencing its tx_hash. Thus, if a node wants to participate in relaying these data packets, they can signal that they offer this service. Otherwise, the node just relays the tx without the data packet. The data packet isn't mined into the chain, instead it exists as a separate database linked to the chain.

Well, this might be tangential.

SamsungGalaxyPlayer commented 4 years ago

While I really like removing tx_extra for uniformity, I strongly recommend that we take a cautious approach here. We should aggressively solicit feedback from services to make sure they have no intended use for tx_extra. Sadly I am aware of at least one service that plans to use tx_extra in some capacity as a stopgap for Travel Rule compliance until industry tools are available and adopted. To any outsider observer reading this, services should really prefer to use off-chain solutions. However, we may see creative (undesired) use of tx_extra to aid compliance before the industry gets its shit together.

tevador commented 4 years ago

I suggest to at least make an announcement that Monero is planning to discontinue the tx_extra field in the near future to discourage new implementations. We can then discuss the details, e.g. if we allow it for coinbase transactions and if we phase out integrated addresses at the same time.

Keeping tx_extra for coinbase txs could alleviate some of the concerns regarding the ability to soft-fork. Future extensions could be placed there by miners similar to how SegWit works in Bitcoin.

UkoeHB commented 4 years ago

and if we phase out integrated addresses at the same time.

@knaccc expressed concern during discussion of #6456 that moving encrypted payment IDs out of the extra field would make them harder to deprecate.

SamsungGalaxyPlayer commented 4 years ago

It's my recommendation that we announce a plan to phase out tx_extra by late 2021, and solicit feedback like we did for address types.

Mitchellpkt commented 4 years ago

An arbitrary plaintext data payload in a system whose privacy relies on indistinguishably is like a screen door on a submarine. 😂 ❤️

Mitchellpkt commented 4 years ago

Hahah, Neptune and I analyzed tx_extra use and found some interesting on-chain data 😆

High-level overview:

Examples

Dates

Multiple formats observed, including:

These dates and PIDs are often repeated, probably for convenient transaction linkability.

Email addresses

There are a large number of email addresses, including personal domains, and several widely-known cryptocurrency ecosystem contributors.

URLs

There are a variety of URLs including:

X is the best X

There are boatloads of transactions with variations on "X is the best X", a few examples including:

Messages

There are hundreds of messages, ranging from jokes to vulgarity. MANY include PII such as names, handles, transaction amounts, credit card info, and contact information (not included below):

TheCharlatan commented 4 years ago

This is great! I'm surprised there are not more malicious payloads, which I guess is what <*> Joins [#xmrchain] ->Guest1 tried to achieve, targeted at whatever is indexing the transactions.

tevador commented 3 years ago

Any progress on this issue? Is there consensus to put this on the roadmap for a future protocol update?

Gingeropolous commented 3 years ago

random ping on this to see if there's any further decisions / ideas

Gingeropolous commented 3 years ago

@SamsungGalaxyPlayer , you had mentioned above:

It's my recommendation that we announce a plan to phase out tx_extra by late 2021, and solicit feedback like we did for address types.

is it time to put these wheels in motion? I'm scratching my head remembering exactly how it all happens. Do we start with a mailing list update? Or just a "press release" and hope folks come across it in time? Or do we announce a dev meeting focused on the topic first?

I feel like this could / should get wrapped in with the next major release, which seems like its gonna be late 2021 anyways.

ping @dEBRUYNE-1 as well.

edited to add dev meeting idea

Gingeropolous commented 3 years ago

from @SamsungGalaxyPlayer

It's worth noting that Thorchain wishes to use tx_extra to pass along commands to the Thorchain nodes. Messages will be of the two formats:

Format 1: Adding XMR liquidity

ADD:XMR:

example: ADD:XMR:tthor1zpa4c6zpa4cyz9s93xuje2pwkswsqzn2zpa4c

Note: relies on https://gitlab.com/thorchain/thornode/-/issues/917, or else replace XMR with XMR.XMR

Format 2: Swapping XMR for another asset

SWAP:CHAIN.ASSET:DESTINATION:LIMIT:AFFILIATE:FEE

example: SWAP:THOR.RUNE:tthor1zpa4c6zpa4cyz9s93xuje2pwkswsqzn2zpa4c:3141441780:tthor1ql2tcqyrqsgnql2tcqyj2n8kfdmt9lh0yzql2tcqy:10

kayabaNerve commented 2 years ago

1) TX extra, under Seraphis, will become solely for arbitrary data already. This would be the perfect time to remove it. 2) Removing it may be one of the worst ideas out there.

While I agree removing all wallet data is a great idea, as Seraphis does, that does not change the fact there are L2-esque platforms relying on it for short memos. My simple statement on the matter is as follows:

If I can not place arbitrary data in TX extra, I'll place it elsewhere.

We can remove TX extra and celebrate it. Great. Except now, when I need 128 bytes, I'm adding 2 fake outputs worth 0 to get 1 TX key, 1 R, and 1 commitment. It's trivial to find a valid point with the first two bytes and get 30 bytes per point, of which I have 2 * 3. I now have a less efficient scheme using more bytes than needed (both due to imperfect packing and the multiples used) while also adding processing requirements to all parties just for the same base penalty.

My proposal, as TX extra becomes for arbitrary data only, would be to cap it at 256 bytes. While ideally we'd only allow 128 bytes, which will fit into a 1-byte VarInt, JAMTIS certified addresses are 168 bytes raw. While we can not use a VarInt to denote its length, yet rather a single byte, I feel that's another discussion.

This means we'd have 88 bytes after a full JAMTIS certified address. Without certification, we have 48 bytes after two JAMTIS addresses, enough for a key and a couple of words. I believe 256 should accordingly be plenty.

We should also discuss an increased economic fee, perhaps 2x per byte, for such TXs, yet we have to be careful we don't encourage steganography (unless we want to for privacy reasons? Where's that one issue to give every TX 16 outputs?).

I talked with a few people at MoneroKon about this, and we (our small group talking at a dinner) did seem to agree that TX extra should stay within reason, such as the above proposal.

Gingeropolous commented 2 years ago

the fact there are L2-esque platforms relying on it for short memos. My simple statement on the matter is as follows:

can these be ephemeral? I.e., can the payload (the stuff in tx-extra) be pruned almost immediately for nodes that don't care? So it ends up just being memos in the txpool or memos for those that care?

kayabaNerve commented 2 years ago

No. They have to be signed, tied to a transaction, and practical flow operates on the 10th block, not the mempool. While signed memos could be separate, increasing bandwidth and processing power while decreasing storage, we're then building a distinct messaging layer. While that messaging layer, which is more resource intensive and has an ordering problem (as it's no longer part of the transaction) could be built by L2 services, there's design problems (from chicken-egg to a requirement on permanence for verifiability) which is why they don't simply do it in the first place and instead rely on the base-layer.

If asked if it'd be easier to use steganography or build such a system, I'd say steganography. I believe that should be sufficient to comment on the practical choice developers will make in such cases. While I don't mean to be an antagonistic asshole, and wouldn't flood any chain solely to lower my own requirements, I actually did comment on the ability to achieve needed functionality via steganography on another network over a single transaction. This wasn't an 'option', yet a proper comment on the way to do it, if done. The discussion here, +3 Monero outputs, would absolutely be far easier for me to solve this and I don't consider it sufficiently negative to Monero to consider not using 5-output* transactions. I believe such TXs would have an almost identical privacy impact though.

*Change, output, +3 data. In my specific case, a Monero -> Monero swap, if they existed, would need 3 outputs for data, hence that number. For Monero -> BTC (20-byte address)/ETH, it'd only need 1. 3 output TXs may exist much more frequently and have less of an impact on privacy? And then for Monero -> BTC (32-byte address), it'd be 2 (4 outputs). This ignores the fact this network is already doxxing Monero in, just as any exchange does, which is why we are trying to minimize TXs on XMR itself.

tevador commented 2 years ago

My proposal, as TX extra becomes for arbitrary data only, would be to cap it at 256 bytes.

Except this doesn't fix the main issue, which is splitting the anonymity pool. Apart from removal, the second best option would be to mandate a tx_extra field of a fixed size in all transactions.

Except now, when I need 128 bytes, I'm adding 2 fake outputs worth 0 to get 1 TX key, 1 R, and 1 commitment. It's trivial to find a valid point with the first two bytes and get 30 bytes per point, of which I have 2 * 3. I now have a less efficient scheme using more bytes than needed (both due to imperfect packing and the multiples used) while also adding processing requirements to all parties just for the same base penalty.

You are correct. It is possible to put arbitrary (even plaintext) data in various other parts of the transaction and still have a transction that passes the consensus rules. With Seraphis/Jamtis, that would be about 87 bytes of data per output (Ke, v, t\~, Ko, a\~ can contain arbitrary data. You cannot put arbitrary data in the commitment C, because you won't be able to make a valid rangeproof.).

If we wanted to make it harder to include plaintext data in a transaction, there could be a non-consensus rule (enforced by nodes when relaying a tx) that the supplementary tx data must pass some quick statistical test of randomness.

UkoeHB commented 2 years ago

With Seraphis/Jamtis, that would be about 87 bytes of data per output (Ke, v, t~, Ko, a~ can contain arbitrary data.

I recently added a rule that all K_e and K_o must successfully deserialize as EC points. K_o deserializing is required for the squashed enote model, and K_e deserializing reduces exception safety uncertainties in scanning (and reduces fingerprintability).

tevador commented 2 years ago

As noted by @kayabaNerve, you can take 30 arbitrary bytes and bruteforce the remaining 16 bits until you get a valid EC point (the chance of failure is only 2-16).

For example: Hex: 5468697320697320612076616c6964206564323535313920706f696e742e2e2e ASCII: This is a valid ed25519 point...

kayabaNerve commented 2 years ago

Thanks for the corrections, @tevador. I also only assumed 30-bytes per using my above 3-output example, yet earlier assumed 96 (which you noted was wrong. I was thinking because of the mask... but that requires solving the DL problem).

... do we just want to encourage steganography? It'd take +2 outputs for a Monero swap (origin address for failure, destination address, that leaves 44 bytes when my metadata is ~13). It may increase the amount of 3-4 output TXs, yet I'm not sure that's directly negative. It's that, or a fixed 256 byte payload AFAICT. These message should still be a fraction of items though.

Relation to https://github.com/monero-project/research-lab/issues/96 and its cited comment. If this is ever implemented, then it'd cover the "everyone does it" case already, while solving other considerations.

UkoeHB commented 2 years ago

I am not a big fan of restricting the tx extra beyond stricter semantics (sorted TLV) because it's a field that's literally 'for anything we can't know in advance or are unable to pass judgement on'. At the very least, if a byte restriction is imposed, it should be a per-output limit since memos are generally aimed at a single recipient.

kayabaNerve commented 2 years ago

Any thoughts on removing it for steganography or a mandatory inclusion?

UkoeHB commented 2 years ago

Steganography does not excite me (typically you want tx output + memo - stenography means adding additional outputs which is just a DDOS on scanning), mandatory inclusion implies adding way too many bytes.

tevador commented 2 years ago

Any thoughts on removing it for stenography or a mandatory inclusion?

I don't understand the urge to use precious blockchain space as a communication channel. With just 32 bytes, you can commit to arbitrary data and share that off-chain.

If you want to know my personal opinion, I'm for going all in on privacy. That means removing the tx_extra field and mandating all transactions to have 2 inputs and 2 outputs.

kayabaNerve commented 2 years ago

You actually can't, easily, commit to arbitrary data shared off chain.To discuss my specific use case:

That's why underlying networks are preferred, not to mention the simplicity of doing so. It's the one model which doesn't open additional attack vectors and problems.

The computation cost of steganography are why I advocated for a 256-byte TX extra, justifying that specific size. While steganography would be a valid replacement, largely preserving privacy, it has that trade off.

... you can also just encode data bytes in CLSAGs? It gives you 15 * 252 bits? 0-additional bandwidth, just reduces ring size and still enables creating a separate privacy pool. By just using the newly added 5 decoys though, you can safely get ~150 bytes, which is sufficient. I'd rather do that than work on a new IPFS setup which is likely to be DoSed while notably increasing resource requirements.

And yes, this is a discussion I believe currently regarding Seraphis/other future protocols. For now, TX extra is still here and this hasn't evolved into antagonistic cat and mouse. I'm trying to highlight the point of view which depends on TX extra, and explain the thought process and reasoning which will occur. While I completely understand Monero potentially not wanting to cater to this use case, that will force developers to find solutions which do work. We have to ask if an optimal TX extra use-case is more damaging than the sub-optimal TX-extra-equivalent use case. Since the next optimal solution is steganography, either in the inputs or outputs, we have to consider if we prefer steganography or if we prefer offering TX extra.

tevador commented 2 years ago

It can't be ephemeral and therefore unverifiable in the future.

Why would an atomic swap need to be verifiable forever? Once the swap is completed, the metadata become irrelevant. To future blockchain verifiers, it should look like any other transfer.

I still don't understand why the atomic swap parties would have to communicate using the blockchain. There needs to be at least one round of off-chain communication prior to the swap (to agree on the amount and the price). From a security standpoint, there is no difference between an encrypted on-chain memo field and an off-chain message (e.g. an e-mail attachment). Both are completely irrelevant to 3rd parties.

For example, the following Bitcoin-Monero atomic swaps protocol doesn't need any on-chain memos: https://eprint.iacr.org/2020/1126

kayabaNerve commented 2 years ago

Swaps wasn't referring to atomic swaps, yet multisig-based DEXs which have a large threshold multisig (so also not like Bisq). Funds are sent to the multisig, trusting it to execute, with the memo saying what to do. I am working on one and there is another who has announced their intention to list Monero, with an (incomplete) integration candidate. We both have similar requirements here.

Atomic swaps should solely use ephemeral messaging though, yes.

tevador commented 2 years ago

The DEX can have its own P2P network for passing the memos. You could submit a transaction with a 32-byte hash of the memo on the Monero network and then submit the TXID and the memo on the DEX P2P network. The DEX would look up the TX and check that the hash there matches the internal memo.

It would take more development effort, but it's a much better solution from a privacy and scalability perspective.

kayabaNerve commented 2 years ago

You actually can't, easily, commit to arbitrary data shared off chain.To discuss my specific use case:

  • We can't have people post it on the L2 as it would cost additional fees. They can't pay fees if they can't enter the ecosystem because they can't post messages because they can't pay fees. It's a chicken/egg.
  • It can't be ephemeral and therefore unverifiable in the future.
  • There's a variety of DoS concerns with accepting arbitrary data which is promised to be committed to in a TX, before that TX exists in a confirmed state. I did consider an IPFS node, which could run only accepting allowed hashes, but the issue is it has to wait 20 minutes for the IPFS hash on Monero, which isn't a feasible UX.

While yes, parties could work out an additional solution, it overall increases complexity dramatically and isn't desirable. Even the best solutions still increase the attack surface while decreasing UX. While it would be better, regarding Monero's privacy, it's only a theoretical advantage since any competent party will already know all these transactions and be able to create the differing pools accordingly. While yes, they must know of the network and sync it, any competent party will. So while yes, there's a theoretical advantage to Monero privacy here, there's not a practical advantage.

There is potentially a practical transformation dependent on other schemes which don't require publicly acknowledging data long term. The distinction is anything which doesn't need the data long term, such as swaps, already isn't discussing using TX extra.

tevador commented 2 years ago

Forcing the whole Monero network to sync and store your DEX data forever is clearly not the best solution.

kayabaNerve commented 2 years ago

... from my perspective, I'd disagree. While yes, it is an additional burden on Monero, we're discussing ~100 bytes on relevant TXs. While I'll agree there are better solutions for Monero, none of them offer the necessary security, guarantees , and user experience desired for connecting projects of this type.

I'd also cite how Monero has payment IDs in its code, when we could've had off-chain solutions for that. While yes, this is longer than even original payment IDs, it does make the comment Monero needs to consider UX. This is one of those discussions.

I'd personally be fine with either <= 256-byte optional OR steganography, yet steganography doesn't practically help with privacy since that points likely won't be uniform and that likely will be enough to flag steganographed TXs. These discussions are all about theoretical improvements which we're unfortunately not mapping to actual practical benefit at this time (though I'm sure we can create a contrived scheme which may).

Also, to clarify, it does sounds like you'd endorse a 32-byte TX extra? I assume that'd be mandatory and random bytes (hashed) upon non-inclusion?

kayabaNerve commented 2 years ago

*steganographied data can be encrypted so they do have uniform bytes and do appear indistinguishable. The sole disadvantage is the less efficient encoding combined with the processing costs. Considering we're likely only discussing +1/2 outputs on relevant TXs (which are planned to be infrequent), this is my currently preferred solution. Considering we're not discussing moving to 2/2 only anytime soon (AFAIK), which would seem to be very difficult to manage with the 10-block lock, I'm happy to leave it at that for now as what will likely happen if TX extra is removed (which I do understand the theoretical privacy benefits of doing so).