[Discussion] Consider removing the tx_extra field

tevador commented 4 years ago

First discussed here: https://github.com/monero-project/meta/issues/356

We should consider removing the tx_extra field from all non-coinbase transactions.

Main reasons:

Enhanced fungibility due to a more uniform transaction format.
Protection from the risks of arbitrary data on the blockchain, e.g. copyrighted material, privacy violations, politically sensitive or illegal content etc.

Required data that is currently stored in tx_extra (e.g. transaction public key) could be moved to a dedicated field.

Miner (coinbase) transactions could still allow the tx_extra field for the following reasons:

coinbase transactions are already distinguishable
the risk of harmful content is lower because it would require mining a block
tx_extra is needed for merged mining

Disadvantages of removing the tx_extra field:

losing the ability to soft-fork the transaction format

tevador commented 1 year ago

Actually, the extra blockchain data would probably be a minor issue compared to the fact that the DEX most likely has to publish its private view key, which would allow anyone to determine when an output owned by the DEX is spent. This reduces the effective ring size for everyone using the DEX outputs as decoys.

kayabaNerve commented 1 year ago

Correct. That was what I was trying to highlight with the distinction of theoretical and practical. I still believe there are value in theoretical improvements though. I could say, because DEX inputs are known, and TXs out are known, there's no value in using randomness for TXs out. TXs out appear identical to any other TX though, which I worked hard on, and there is a chance even with knowing TXs out, they won't be perfectly linkable (beyond statistical analysis thanks to the known inputs, which could be solvable with a circuit membership proof). This is because ephemeral data as part of the signing process is used.

Even if the DEX wanted to keep view keys private, there'd still be 100 distinct individuals (the multisig holders) who could dox it though :/ It's why we're not solely acting as an instant exchanger yet also enabling long-term balances, in order to reduce the amount we 'poison' the Monero TX pool. While that has an issue as it's custodial, and not just for a moment, it's been accepted by the markets and larger crypto community. While yes, I'll immediately agree it's technically inferior... that's not what we're discussing.

I will note this makes it identical to a CEX with regards to governments. With regards to firms, it enables more firms (even ones without the relevant partners) to dig in. There's also no level of oversight with data usage. There's a variety of discussions on the impact of this available.

As one final note, though it is off topic, I'm not building a DEX like this to harm Monero, despite the discussions here being about the harm to Monero and me taking the side of the harmer. I decided to build a DEX like this because that other integration was moved into testing. This is the future we're faced with. The other integration however, in my opinion, has several damaging aspects to the community. I could let them move forward, obtaining whatever market share they will, or compete for the same market share while doing less harm. I'm here in these discussions to comment on this same reality. I also believe there is a legitimate service here, which I hope to advocate for.

I believe steganographied data is the best path, and if the computational cost isn't preferred, optional extra < 256 bytes. It maintains the offering to legitimate services, which Monero may not want to encourage, which I'd understand. The former however can't practically be stopped, as it'll appear random if encrypted, with the sole note being it has 3-4 outputs instead of 2. While yes, that was another discussion raised, I don't see it as possible for as long as we have the 10-block lock. I also don't see how that's removeable. Even with a circuit, we need a sufficiently cemented reference point... (though they would be much faster to check the consistency of if we use a 32-byte long form instead of a VarInt short form).

I'll also note without the 10-block lock, you can solely have a carry output and then use the intended change out for steganography. Now we have to ask if developers will use a chain of 3 transactions or tackle infrastructure problems making more things their problem while simultaneously opening up the attack surface. While it'd make me review and prefer an IPFS-esque structure, I'd understand if other developers just chained 3 TXs.

TL;DR Everything sucks. How can we make things suck the least given our reality so maybe things don't suck?

Alternative TL;DR:

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody. -- Hyrum's Law

SamsungGalaxyPlayer commented 1 year ago

I think from a matter of being pragmatic, it's a weird balance, and it's easy to be on one of two extremes:

Monero is pure, keep your crap off it
Monero should be usable for anything so long as people pay for the space

We saw a similar discussion in the past with the Bitcoin developers considering certain types of transactions as spam, including Counterparty transactions. Are these transactions reasonable or spam depends heavily on one's opinion.

While I think a level-headed approach is necessary, the general attitude of "keep your spam off" is a dead-end. Spam is mitigated with fees. If fees aren't enough, that's the main problem. Hike the fees.

The main attitude I care about is mitigating privacy issues.

The Monero community should take efforts to standardize common forms to mitigate privacy leaks on it.

Some hard truths:

We can't prevent sharing of view keys
We can't prevent people from trying to pad information in somehow

In my view, having a standardized way of storing information in a sensible way is ideal at mitigating the damage from 2. If there's no standard way to store info, you'll have 1 person doing A, 1 person doing B, etc. and they'll all suck. Then, after we place most activity into this standard format, then we can more easily improve network privacy.

Efficiency is another consideration. If only 1 transaction wants to store data a certain way, then it's clearly not worth padding every other transaction to account for this. It's about picking reasonable standards, and eating an efficiency cost if the privacy benefit is worth the cost.

We can keep playing a game of whack-a-mole to try to discourage development and storing of arbitrary data in specific ways, but it ultimately won't lead anywhere.

spirobel commented 1 year ago

We should try to avoid making it look random when it is not. It is good if transactions of Defi protocols are non uniform and easy to identify, because it is known to the public what these transactions are anyway. If they are easy to identify the decoy selection process could take into account that there is a public dataset out there that marks these transaction hashes as part of a certain defi protocol.
If we want to improve the privacy of tx_extra and its usability for new use cases we should make the length random instead of fixed. (for example letting a browser wallet save the url of the website where the transaction was made instead of the (dummy) payment id that is currently the default. I want to implement this because I want users to recover their wallet from just the seedphrase without losing information) How would this work in practice: currently tx_extra is always padded with a dummy payment id so all transactions look like they are made to integrated addresses. (so the transactions are as uniform as possible). So the norm is this fixed length. But the norm should be a random length. (possibly probabilistic based on the current circumstances in the network; similar considerations to decoy selection)

hyc commented 1 year ago

Given the recent attention to the topic, some additional thoughts:

1) we could keep tx_extra, and continue to mandate the use of tag-length-value for each element stored there. 2) we can also mandate that anything over a threshold (e.g. 32 bytes) must be encrypted. This will prevent liability issues if people decide to attach large blobs of controversial data. We'd need something like an Authenticated Encryption cipher that allows a check to see if data was encrypted correctly.

The only downside here is a potential DoS vector if it costs a lot to verify encryption of the data.

kayabaNerve commented 1 year ago

Two comments.

1) I am against TLV. If we have TX extra, it should only be for arbitrary data. It only makes sense to enforce an encoding if we then place wallet protocol items back in there. If that's a legitimate consideration, I'd rather: A) Have a second TX extra for wallet protocol usage B) Just hard fork in explicit fields as needed

2) If there is an encryption primitive, that allows verifying it's well formed without knowing the decryption key, I'm all for it. AFAIK, we cannot determine if data is encrypted except for testing how uniform it is. If the authentication was layered, then while we could verify its authenticity, we'd run into the same problem of not knowing if it's actually encrypted.

As for a uniformity test, I am generally against statistical tests on transactions, as they lead to potentially random failures. For any uniformity test, with a set encryption algorithm and set key, there will be messages which naturally are unsendable. While they may be few and far between, or even 2^64 unlikely, I despise systems which can randomly fail. I do acknowledge prefixing a 8-byte nonce practically solves this for any deployed system.

If we want to move forward on this discussion, I'd first ask which branch we're pursuing. A) Fully uniform TXs, removing arbitrary data for steganography (technically non-uniform by output quantity), increasing global scan time and chain state size B) An explicit arbitrary data field

If A, we no longer need a further discussion of anything IMO. If B, my advocacy is for Monero to do absolutely nothing with it. It shouldn't check it, no wallets should put data there, nada. The only bounds should be on max size and weight calculation. I'd tolerate a uniformity requirement, not that my tolerance decides what's done, due to liability concerns yet I'd note we shouldn't require > 32 bytes. Stream ciphers can be applied to an arbitrary amount of bytes trivially, and if you're concerned about liability, a URL can be just a few bytes (4?).

I also believe requiring the data to be encrypted would increase privacy. I personally just rather yell at people to encrypt their data before placing it on chain than make a consensus rule out of it.

tevador commented 1 year ago

We'd need something like an Authenticated Encryption cipher that allows a check to see if data was encrypted correctly.

Authenticated encryption doesn't allow you to check that the data was encrypted correctly unless you know the secret key. Proving that something is encrypted without revealing the secret key is hard and expensive as it requires proving statements over general ZKP circuits.

It would be much simpler and cheaper to do a quick statistical test when relaying a transaction (it would be a non-consensus rule).

As for a uniformity test, I am generally against statistical tests on transactions, as they lead to potentially random failures. For any uniformity test, with a set encryption algorithm and set key, there will be messages which naturally are unsendable. While they may be few and far between, or even 2^64 unlikely, I despise systems which can randomly fail. I do acknowledge prefixing a 8-byte nonce practically solves this for any deployed system.

For any randomized encryption (and you need randomized encryption to achieve IND-CPA), you can make the chance of failure arbitrarily low by rerandomizing until the test is passed. Moreover, the statistical check would be done when relaying a transaction, so it would not affect blockchain consensus.

kayabaNerve commented 1 year ago

If it's under TX sanity, I have no objections at all. If it's under relay, I still have my concerns.

Regardless, it's trivial to just append a trailer to the encrypted data to achieve uniformity on any uniform message not detected as uniform (which should already be incredibly unlikely). I will caveat that just as an near-uniform message can have a trailer to achieve statistical uniformity, so can a non-uniform message. It may be proper to:

1) Check for near-uniformity under sanity 2) Ban ASCII messages entirely

I caveat 2 in case there remains some ASCII strings which would pass uniformity (I assume we'd do it on a bit level, and ASCII just has the first bit never set. That only makes it non-uniform by 12.5%). If the concern is perception against malicious messages, saying we banned plaintext, both with a statistical distribution check preventing long alphanumeric strings AND by literally banning ASCII would likely be solid.

I do acknowledge actually achieving that statement we don't allow public messages require this be under relay. While I have my irks, I can ack the benefit.

spirobel commented 1 year ago

It would be much simpler and cheaper to do a quick statistical test when relaying a transaction (it would be a non-consensus rule).

All of these attempts will forever be snakeoil. The only clean solution to this is to get rid of the transaction uniformity issue entirely (by building a better protocol where transaction uniformity has no effect on privacy).

The tx_extra debate is a strawman. The fundamental trade off is between transaction fees and "blockspace filled with stuff somebody considers spam". A simple solution like getting rid of tx_extra wont get rid of this trade off. You cant have cheap transactions and a jpeg free blockchain at the same time.

tevador commented 1 year ago

a better protocol where transaction uniformity has no effect on privacy

A blockchain with non-uniform transactions will always contain some extractable information. That's a basic information-theory fact that cannot be fixed by any protocol.

transaction fees

Transaction fees exist to limit the volume of on-chain spam. They don't do anything against harmful content.

tx_extra is an arbitrary-size field that's completely ignored by consensus, so it's a very efficient way to stuff data onto the blockchain. For example, you can pay fees for 100 KB of blockchain space and get about 98 KB space for arbitrary data (~2 KB is the consensus "overhead", such as signatures and range proofs).

If there was no tx_extra, you could stuff let's say 30 bytes of data into the output key. 100 KB of blockchain space will get you 50 2-in/2-out transactions for a total of ~3 KB of arbitrary data. "Uploading" to the blockchain has just become 30x more expensive without affecting the fees for ordinary transfers.

hyc commented 1 year ago

I am against TLV. If we have TX extra, it should only be for arbitrary data.

If you don't use tags, then two different apps that both use arbitrary data will be mixing up their usage, with no way to distinguish them. That would be an incredibly shortsighted design.

It would be much simpler and cheaper to do a quick statistical test when relaying a transaction

That would be fine. You could improve the test by doing one pass of encryption (with an arbitrary key) and comparing the statistical result of the input and encrypted output. If the input is already encrypted then it should be nearly equally distributed in both cases.

kayabaNerve commented 1 year ago

@hyc Considering I don't care for Monero to become an application layer, I don't care for multiple apps to co-exist in TX extra, which is the only way that'd be an issue.

I'd also note that apps are welcome to define whatever formats they want, including ones with magic bytes, and if one app wants to work off another, it can build a complimentary format. All of that is in the realm of the apps, not in the realm of Monero. Monero enforcing TLV, in almost every case, will just add 2 bytes to the arbitrary data, increasing its size by >2% (assuming most messages are <=100 bytes. For my desired messages, it's actually 4-6%).

With regards to detecting uniformity, I don't see value in doing another pass and checking uniformity against that. Encryption should be uniform. Why should we generate a theoretically uniform value (which isn't exactly cheap, at best it's an extra hash round) to check against when we can just check uniformity in general?

With regards to checking uniformity, I'd suggesting checking the distribution of nibbles. They only have 16 possibilities and even just 8 bytes will provide that many instances. With regards to checking that distribution, this isn't my field of expertise. I'll ack that another encryption pass would demonstrate practical distribution for the given length, yet hope we can find a cheaper check.

spirobel commented 1 year ago

@tevador

A blockchain with non-uniform transactions will always contain some extractable information. That's a basic information-theory fact that cannot be fixed by any protocol.

that is an irrelevant technicality. The issue is that Monero depends on statistics for its privacy guarantees. That is a major weakness and the reason why this debate exists in this form.

The only solution to the transaction uniformity problem is to use a protocol where spent notes are completely disconnected from transactions.

tx_extra is an arbitrary-size field that's completely ignored by consensus,

it is more complex than that.

If there was no tx_extra, you could stuff let's say 30 bytes of data into the output key. 100 KB of blockchain space will get you 50 2-in/2-out transactions for a total of ~3 KB of arbitrary data.

what about multi destination transactions for example? what about transactions with more outputs?

@hyc

That would be fine. You could improve the test by doing one pass of encryption (with an arbitrary key) and comparing the statistical result of the input and encrypted output. If the input is already encrypted then it should be nearly equally distributed in both cases.

You could also just calculate the entropy of the string. Easier to implement and does the same thing.

It is generally not a smart idea to prove that something is random. (because it is not possible) But this is what what we are talking about here.

It would be much better to stop these futile attempts and focus on solving the root cause. Why is transaction uniformity even an issue? Because there is a relationship between utxos and transactions.

dan-da commented 1 year ago

Why is transaction uniformity even an issue? Because there is a relationship between utxos and transactions.

@spirobel Are you suggesting to somehow separate and unlink utxos from tx? How would that work?

From what little I've gathered from the sidelines, people advocating for uniformity have suggested allowing only 2 in 2 out tx and eliminating tx_extra. Perhaps get rid of multisig also, I'm not sure. I'm unsure of all the implications and tradeoffs here, but I do find it interesting to think about ways to make every tx look exactly like every other, so we get a single anon pool...

Are you hinting at another way to go about this?

tevador commented 1 year ago

The only solution to the transaction uniformity problem is to use a protocol where spent notes are completely disconnected from transactions.

What you are proposing is already listed in Open Research Questions as one of the highest priority topics. The closest concrete proposal is https://github.com/monero-project/research-lab/issues/100 which does not even have a PoC that would have a chance of working with Monero. It will take many years before something like that can be deployed.

Reducing tx non-uniformity caused by tx_extra has a positive impact on privacy and can be implemented now.

that is an irrelevant technicality

It is very relevant. Even if we have a protocol that offers a global anonymity set, tx non-uniformity will cause anonymity puddles to form, significantly reducing the actual privacy properties of the protocol.

For example, refer to this zcash issue, which shows how tx non-uniformity can leak information even with a global anonymity set.

it is more complex than that

Can you elaborate?

what about multi destination transactions for example? what about transactions with more outputs?

Ideally, we would only have 2-in/2-out transactions for maximum uniformity. Using multiple outputs per transaction would somewhat improve the efficiency of steganography, but it's still much less efficient than overtly putting the data in tx_extra.

kayabaNerve commented 1 year ago

1) Not to sidetrack, yet a SNARKs PoC would take weeks and an actual impl months. Not many years. There's just a lot of discussions on its benefit/desirability/practicality. It's more bureaucratic bs/developer availability than actual technical issues IMO.

2) 2-out, for as long as we have a 20-minute lock, sounds horrifically infeasible. I do understand that grows exponentially. It's just still hell for every single integration.

3) If there's no TX extra, and only 2-out TXs, I'm cutting every corner I can.

There's two considerations here. 1) I don't want to be an asshole. Obviously, we're all here for Monero to do our best to produce the best protocol we can. While we may disagree on what that is/how, we're here to work together. 2) I'm supposed to be an asshole. If we're discussing removing TX extra, then I, as the theoretical developer needing TX extra (and also a practical example) am required to discuss how I will surpass limitations placed so we can discuss how effective they are.

With that out of the way, here's all the corners I know of which can be cut.

1) The TX ephemeral key. If you're fine with a fully public wallet, which deterministically derives the r off public data, it can offer ~30 bytes by using the public key. 2) JAMTIS enc tag (18 bytes). 3) JAMTIS hint (2 bytes). 4) View tag (1 byte).

So far, we're at ~51 bytes.

5) Balance proof (32-bytes). This has the side effect of doxxing the balance of the change output. This can only be prevented by not exposing this scalar and returning to a zero sum check (not to suggest that's possible). 6) Selected group members. This could potentially offer ~100 bytes, albeit by potentially eliminating most decoys in the transaction. This can only be prevented by deterministic group selection. 7) Possibly some nonce abuse for <=32 bytes each? But breaking the privacy of the proofs underneath them.

So ~51 bytes safely (if calling published view keys safe), though with a good amount of pain, and then ~200 bytes if you disregard safety.

I'm actually fine with just 51 bytes. It's manageable for my needs. While Monero isn't about me, I argue in this discussion based on real world implications of TX extra, and my needs are real world considerations. The concern is people who want more than 51 bytes. They will then be forced to endanger their users. While that's... arguably fine? They're 'consenting' to it? It raises the discussion on user knowledge of privacy protocols and if it's informed consent, and if forcing these protocols to such drastic measures is best for Monero and its users.

As one final note on 2-2 TXs, if the 20 minute lock is maintained, handling a 1 TPS in/out flow, aggregating, then with the full balance funding outputs (to enable handling one really large output with further logic), would take 7 hours. This is definitely a side track, yet I hope to point out how much of a complexity it is in hopes to largely drop it from this discussion, enabling re-focusing it.

AFAICT: A) Remove TX extra. +uniformity +pain for arb data B) Keep TX extra, likely requiring some statistical uniformity/non-ASCII. There's also advocacy for forcing TLV which I cannot say I believe should be in-scope to Monero who I believe should leave this field as arbitrary.

I do truly believe consensus should be reached on one of these two options before we lose ourselves in nits in every single direction if we want to accomplish practical change by Seraphis. In that spirit, cutting through the debate here, we have TX extra now, koe implemented it into their Seraphis work (along with a TLV encoder/decoder, which I'm unsure the exact relation to consensus/relay on). If this discussion doesn't have a resolution by then, we'll at least maintain the status quo, and just be at risk for spam/have uniformity concerns.

If this is to remain a discussion for the next two years, as it has the past two, then I'm fine dropping my pressure on having a directed discussion. The truth is Monero is an evolving protocol. We don't need to make decisions now, and we can give ourselves the time to evaluate all options to be fully informed. I'm solely concerned that we'll lose our opportunity to make this change with Seraphis, getting implicitly locked into whatever's written there while this discussion continues ad infinitum.

kayabaNerve commented 1 year ago

*There's a few more things you can do for a few more bytes. Steg'ing a 2-2 would prob still get you up to 60+ bytes without compromising privacy.

tevador commented 1 year ago

Not to sidetrack, yet a SNARKs PoC would take weeks and an actual impl months. Not many years.

I was talking about mainnet deployment. Seraphis has been in development since 2020 and will probably hit mainnet in 2024 or 2025. I would be very surprised if SNARKs can be done much faster than that, including all audits etc.

tevador commented 1 year ago

As one final note on 2-2 TXs, if the 20 minute lock is maintained, handling a 1 TPS in/out flow, aggregating, then with the full balance funding outputs (to enable handling one really large output with further logic), would take 7 hours. This is definitely a side track, yet I hope to point out how much of a complexity it is in hopes to largely drop it from this discussion, enabling re-focusing it.

Limiting transactions to 2/2 was just an example how to achieve perfect tx uniformity. I don't actually think it's a good idea to implement it in practice, at least not for now.

This discussion is about tx_extra. Even if we don't remove it entirely, any restrictions placed on tx_extra would be an improvement since it's completely unmitigated at the moment.

At the very least, we should:

Remove all mandatory transaction data (such as public keys) from tx_extra. This is already being implemented with Seraphis.
Place a reasonable upper limit on the size of tx_extra -OR- make tx_extra prunable.

I'm also in favor of putting additional uniformity requirements on tx_extra, but it seems that there is no consensus about that.

kayabaNerve commented 1 year ago

255-byte limit, giving it a one byte length prefix (micro-optimizing the VarInt, sure, but it adds up)
Statistical uniformity check
ASCII ban
Don't force apps to use TLV when we already aren't an app chain and it only helps when multiple apps use the same TX. If someone wants to build a service with that behavior, they can use TLV themselves.

That is my current advocacy.

dan-da commented 1 year ago

ascii ban would prevent eg base 32, 58, 64 encoding, no? (which could pass uniformity check if input is encrypted). Is that desired?

kayabaNerve commented 1 year ago

It's an ack of hyc's thoughts. While I don't personally have a concern Monero will be criticized for having garbage, I can agree it's undesirable. Banning the entire TX extra from being a valid string would force users to encrypt it in some way or turn it into some data object. Accordingly, we could no longer be accused of having messages nor enabling messages. Solely payloads.

Then, the statistical uniformity check turns payloads into encrypted payloads, fully absolving us of concerns.

The ASCII string check is cheap, and for some short ASCII strings, they'll likely pass uniformity (again, a URL can be just 4 bytes), hence the benefit in their explicit ban.

jeffro256 commented 1 year ago

If we want to move forward on this discussion, I'd first ask which branch we're pursuing. A) Fully uniform TXs, removing arbitrary data for steganography (technically non-uniform by output quantity), increasing global scan time and chain state size B) An explicit arbitrary data field

I'll propose something slightly different which I think will make an good compromise between usability for DEXes and other use cases, and keeping malicious content off-chain. Non-malicious use cases of arbitrary data in transactions fall into three main categories: 1) "Receipt stuff" for bookkeeping between counterparties (Descriptions, real world IDs, refund addresses, etc), 2) "Offchain consensus stuff" which are inherently tied to the success/failure of a certain transaction becoming confirmed (Atomic swaps, Uniswap, Serai, etc), and 3) "toy features" like encrypted messaging and storing source code. All of the uses for arbitrary data are NOT relevant to the consensus of a Monero transaction. Monero has no scripts, and it is otherwise generally hard to enforce encryption without a lot of computation. It is hard to enforce randomness without causing weird hard to debug issues like @kayabaNerve mentioned.

My proposal is as follows: move everything consensus related outside of tx_extra, and change tx_extra to a fixed-length 32-byte hash field. If it ends up not being used in a transaction, fill it with a dummy hash. BUT create relay rules on the daemon side which will store arbitrary blobs (up to a certain reasonable size, say 1024 bytes) which match tx_extra hashes for a medium block-time (say 512 blocks), just for wallet's convenience sake. We can also formulate many relay rules regarding these blobs using ideas already mentioned here (encryption enforcement, randomness tests, no ASCII, other length tests, TLV, etc, etc) WITHOUT having to affect consensus (hard or soft forking) or relay reules for the rest of the transaction. Daemon operators will get to choose which arbitrary blobs they host and for how long, creating a "blob pool" alongside the transaction pool.

For every bit of information which is needed for cases 1) and 2), it is in the sender's own interest to communicate that data to counterparties if they go offline for more than 512 blocks. Storing and serving large blobs of non-consensus information should not be a task that the dameons should shoulder, rather the counterparties which need that functionality can just pass it to the person that needs that information. But especially for use case 2, there is easily verifiable cryptographic proof that some specific bit string was constructed and signed by the maker of that transaction (as long as you verify the rest of the transaction).

This scheme mitigates a lot of on-chain spam and the risk of storing illegal content while maintaining usability. It also slightly increases on-chain transaction uniformity (albeit with some flaws).

EDITED

kayabaNerve commented 1 year ago

NACK. This does not properly solve the data availability problem which is why people want to use TX extra in the first place. While it does do better at it than just a 32-byte hash, with the nodes never seeing the payload, this will be extremely frustrating to design around, even with a long block period.

To be clear, I would prefer to steg data than to use such a scheme, just to ensure the data's lifetime is equivalent to the transaction's.

jeffro256 commented 1 year ago

with the nodes never seeing the payload, this will be extremely frustrating to design around, even with a long block period.

What kind of application requires that ephemeral data can't be sent alongside transactions?

kayabaNerve commented 1 year ago

In my use case, a DEX with its own blockchain, all interactions require validator interactions or transactions.

To start with the latter, that means users who send in Monero then have to send the payload, which costs fees, creating a chicken and egg problem (need gas to pay for payload, can't get gas because the payload wasn't sent). This does not work.

While the users could send it directly to the validators, this necessitates users being able to directly communicate with validators (which has its own commentary regarding DoSs and network architecture). This is an entire additional data pipeline. I also then have to publish that data onto my chain to ensure a copy is available for as long as relevant (it's relevant till Monero dies or the DEX dies, whichever comes first). While you can argue that's fair, it's my chain's data, as a hub that'll become excessive to store full data payloads from every chain vs minimal representations.

It's infinitely less of a headache to not necessitate that pipeline + publish IPs of validators + store full payload backups on my end to simply having the original chain carry the original data. It's more efficient for storage overall and has one pipeline for Monero, not one pipeline for Monero and one for payloads. Steganography is also far easier than that pipeline and tackles Monero's lack of data with a Monero specific solution.

You can say I'm an ass for this opinion/stance. That's the point. The discussion is on making TX extra not problematic. Nuking its functionality like this is damaging enough it's problematic to some users, who will then seek other options, making it damaging to Monero. It's an inevitability.

TX extra is just one way to store data on chain. Monero needs to either:

Make it the best option to store data
Remove it

IMO, your proposal makes it no longer the best option.

Gingeropolous commented 1 year ago

so in essence, no prunable solution would work because you ultimately need the data on chain forever.

jeffro256 commented 1 year ago

You can say I'm an ass for this opinion/stance. That's the point.

I don't mind, I don't think you're an ass :). I'm a gremlin that thrives on disagreement, so don't worry about that.

creating a chicken and egg problem (need gas to pay for payload, can't get gas because the payload wasn't sent). This does not work.

Now I won't claim to be an expert on Serai, but validators need to be able to access on chain Monero transactions anyways right? The rules can be changed such that required blockchain blobs which are verifiable can allowed on the Serai chain, no?

as a hub that'll become excessive to store full data payloads from every chain vs minimal representations.

If it's excessive for your chain, imagine how much more excessive it is for every other node operator who has to store payloads for Serai when they don't even use it. These "excessive payloads" would make running a node on low grade hardware much less feasible, reducing decentralization.

It's more efficient for storage overall

Not necessarily, since on-chain Monero space is arguably (at least right now (; ) more expensive and more valuable than Serai chain space, especially due to the large PoW base.

Steganography is also far easier than that pipeline and tackles Monero's lack of data with a Monero specific solution.

This is a whataboutism, IMO. Just because there are other uniformity weaknesses in Monero's transaction protocol is not an argument to allow large arbitrary payloads embedded in transactions.

TX extra is just one way to store data on chain. Monero needs to either:

Make it the best option to store data Remove it

I don't think it needs to be that black and white. You mentioned "minimal representations". You argue that the minimal representation should be stored on the Serai chain, while the Monero chain shoulders the burden of bulk data. I believe it should be the other way around, since I believe it makes more sense from a privacy standpoint and from a self-interest standpoint. We might just fundamentally disagree on that, though. I don't know how to fix that.

IMO, your proposal makes it no longer the best option.

Yes, I agree. tx_extra shouldn't be the most efficient way to store data because storing data inside a PoW blockchain will always be 100,000x less efficient than just sending a regular old TCP message to a counterparty. In my opinion, if the data is not necessary for consensus, then it should not be on a blockchain. It makes sense to put hashes of stuff on chain, as to indirectly verify that data through hard consensus, but it doesn't make sense to keep that stuff embedded in-chain forever for anyone who is not a counterparty. If some data is necessary for your concensus, Serai or otherwise, you will be self-interested in propogating/storing that data. Thus, those problems will sort themselves out naturally. Is that kind of a "screw you deal with it" answer? Maybe. But it puts the burden on those who who use the features, and not everyone else.

Hopefully, inb4 "[Discussion] Consider removing non 2-output transactions"

Mandatory: sorry for being an ass

spirobel commented 1 year ago

@kayabaNerve

TX extra is just one way to store data on chain. Monero needs to either:

Make it the best option to store data
Remove it

IMO, your proposal makes it no longer the best option.

and that is really the gist of it. We should also avoid adding more and more "clever" little bureaucratic rules to the monero codebase and consensus. Like this one for example: https://github.com/monero-project/monero/pull/8733

The tradeoff is between arbitrary data saved on the blockchain and low transaction fees. No amount of clever little rules is going to change this. Also the goal of @tevador to make this use case more inefficient is counterproductive and damaging to the long term value and health of the Monero chain and ecosystem. There are two reasons for this:

The tradeoff of low fees and arbitrary data saved on the blockchain cant be avoided by making it more inefficient to save the data. It makes the bloat problem even worse. It will still be cheaper to save data on the Monero blockchain compared to Bitcoin for example. No matter if you add a premium to it by making it more inefficient. All you achieved is adding even more bloat and overhead to the chain.
Ad hoc additions of new rules damage the trust in the consistency and continuity of the meta consensus, that we create here about the monero protocol. We need to give people reason to believe that the rules can't be changed randomly because somebody does not like the aesthetics of what is being saved on chain. It is also important to keep sending this message to governments and regulators. Code is law. And this law can't be changed randomly because of someones feelings.

@jeffro256

It is hard to enforce randomness

it is impossible in general to prove that something is random. Can we please all agree on this assumption?

That is a philosophical and scientific fact. It is impossible to prove that something is random. I thought that was common knowledge and I am a bit surprised that we even need to have this conversation.

If anyone has doubt about this, we should walk it through and make sure we convince ourselves of this very basic truth: there is no way to prove that something is truly random.

tevador commented 1 year ago

We should also avoid adding more and more "clever" little bureaucratic rules to the monero codebase and consensus. Like this one for example: #8733

It's a relay rule, not a consensus rule. Node operators have every right to place restrictions on the data that resides on their machines. There is broad consensus that the default limit should be what the PR is proposing. Anyone who disagrees can simply change the limit before building their binary. That's the power of open source software.

The tradeoff is between arbitrary data saved on the blockchain and low transaction fees.

As I said before, fees only limit the volume of spam, not its content. This discussion is not about fees.

That is a philosophical and scientific fact. It is impossible to prove that something is random.

That's an irrelevant technicality. You can decide that something is not random with a probability of p, for a value of p arbitrarily close to 1. It's called statistical hypothesis testing.

j-berman commented 1 year ago

I don't think statistical testing for uniformity will work. It sounds trivial to fool. Simply use an encoding scheme where you XOR plaintext with a static random pad, and then decode by doing the same.

Example plaintext payload:

00000000 10101010

Static random pad:

01101000 11101001

Encoded payload that will pass a uniformity check:

01101000 01000011

tevador commented 1 year ago

At least it would force developers to think about encryption. Static key streams, reused nonces, AES-ECB are all better than plaintext. There is a good chance that at least some of them would use a common library that offers secure encryption.

spirobel commented 1 year ago

@tevador

As I said before, fees only limit the volume of spam, not its content.

So you want to police the content of transactions? Dangerous precedent to set. We are trying to build a censorship resistant system. It is not a good signal to be sent to governments and regulators that ad hoc rules can be added that police the content of transactions.

For sure there needs to be a consensus that clearly defines what is a valid transaction and what isn't. But these rules need to make sense.

At least it would force developers to think about encryption. Static key streams, reused nonces, AES-ECB are all better than plaintext. There is a good chance that at least some of them would use a common library that offers secure encryption.

"Developer education" is not a valid reason to add rules to the Monero network. A consensus or relay rule is not the right place to give lectures.

This discussion is not about fees.

Yes it is. That is the tradeoff. If you limit the size of tx_extra the jpg can just be split into parts and you end up with even more overhead.

Node operators have every right to place restrictions on the data that resides on their machines.

That means they are liable for its content. This logic is wrong. We should not go down this route because it will directly lead to censorship. This is similar to the discussion that happens in bitcoin right now. Code is law. The feelings of node operators should not matter. They need to be bound by this law and by nothing else.

You can decide that something is not random with a probability of p, for a value of p arbitrarily close to 1. It's called statistical hypothesis testing.

that is snake oil cryptography. You can't prove that something is random and you should not try.

jeffro256 commented 1 year ago

that is snake oil cryptography. You can't prove that something is random and you should not try.

No one said anything about “proving” randomness, though. A statistical randomness test is not a proof of anything, it would mainly just prevent users from making from low effort abuse or accidents. I don’t think we should do it just because it might make UX a little worse unpredictably, but don’t strawman.

tevador commented 1 year ago

So you want to police the content of transactions?

Yes, we want to police transactions for maximum uniformity. This is already done in Monero:

We require a specific ring size (since v8).
We require at least 2 outputs in all transactions (since v12).
The 10-block lock time is enforced (since v12).
etc.

Do you also consider this to be "censorship"?

A consensus or relay rule is not the right place to give lectures.

It is when the privacy of others or the network security is in stake. For example, the 10-block lock time was often violated (by mistake or on purpose) by wallet developers before being enforced by consensus in 2019.

If you limit the size of tx_extra the jpg can just be split into parts and you end up with even more overhead.

If someone wants to spam the network, they can do so even without tx_extra.

The feelings of node operators should not matter.

They do matter. Nodes are run by volunteers. Ideally, we don't want them to be forced to shut down.

spirobel commented 1 year ago

@jeffro256

If some data is necessary for your concensus, Serai or otherwise, you will be self-interested in propogating/storing that data. Thus, those problems will sort themselves out naturally. Is that kind of a "screw you deal with it" answer? Maybe. But it puts the burden on those who who use the features, and not everyone else.

There should be some consideration for data that is not directly necessary for the consensus of the main protocol. Without projects like @kayabaNerve 's Serai, Monero will become an island. All successful chains allow this to some degree or another.

While the main focus should be on the security of the consensus of the main chain, we should not ignore the outside world completely.

No one said anything about “proving” randomness, though.

You said enforcing randomness is hard. It is not hard. It is impossible.

don’t strawman.

why are we discussing this with such a seriousness then? From the suggestion that enforcing randomness is hard (and not impossible) and the vigor in which @tevador presents his ideas here, I gathered that maybe we were not as aware of this ground truth as we should be.

I don’t think we should do it just because it might make UX a little worse unpredictably,

I agree! Good point! Same goes for the size limit.

@tevador

Do you also consider this to be "censorship"?

I consider these duct taped solutions until the root problem is fixed. It would be better if we spent our efforts on this.

Why are you not working on the issue that you mentioned earlier? https://github.com/monero-project/research-lab/issues/100 Would be a better use of your time!

It is when the privacy of others or the network security is in stake.

I totally I agree with this. That is why we need to fix the root cause and not try to chase the aesthetics of randomness here which will give us nothing. It just makes the user and developer experience worse.

They do matter. Nodes are run by volunteers. Ideally, we don't want them to be forced to shut down.

So it would be best to not make changes that enable selective censorship of "spam" at the node level. Lets just not go down this rabbit hole.

tevador commented 1 year ago

I don’t think we should do it just because it might make UX a little worse unpredictably

The test can be deterministic, so there would be no unpredictability.

I consider these duct taped solutions until the root problem is fixed.

As I said earlier, the problems caused by transaction non-uniformity cannot be fully solved just by a better membership proof.

See: https://github.com/zcash/zcash/issues/4332

jeffro256 commented 1 year ago

I agree! Good point! Same goes for the size limit.

The test can be deterministic, so there would be no unpredictability.

Yes, and no. The test itself would be deterministic, but if you construct a payload and it happens to fail the randomness test, you have to change the message, whether that is an issue or not depends on if you expect the test to fail and have a nonce to inject randomness, or something else similar. Just adds an extra layer of tests which you can't necessarily predict before making the payload, unlike with a length limit which is very easy to verify beforehand.

spirobel commented 1 year ago

@tevador

As I said earlier, the problems caused by transaction non-uniformity cannot be fully solved just by a better membership proof.

but obviously they make the problem a lot less severe. Decoy selection is one of the weakest points of Monero. And I am sure you can come up with a way to overcome the challenge that you mentioned!

How would you deal with the arity correlation attacks? are there similar attacks? Maybe you can add that to the issue that you mentioned: https://github.com/monero-project/research-lab/issues/100

The test can be deterministic, so there would be no unpredictability.

the test would also be useless, because it can be circumvented easily. It is better to work on things where we can actually have an impact.

The last post by @narodnik https://github.com/monero-project/research-lab/issues/100#issuecomment-1374407464

did not get an answer yet. He also seems very willing to help as he indicated earlier. So it would be a good idea to confront him with these concerns you have identified.

jeffro256 commented 1 year ago

Yes, we want to police transactions for maximum uniformity. This is already done in Monero:

We require a specific ring size (since v8). We require at least 2 outputs in all transactions (since v12). The 10-block lock time is enforced (since v12). etc.

+1. We should move towards transaction uniformity since fungibility has always been one of the, if not the most important, goals of Monero. It's what separates digital cash from speculative garbage coins. As @spirobel pointed out, removing tx_extra, or at least standardizing it, will not fix all transaction uniformity problems. Also, as @tevador pointed out, neither will simply changing membership proofs. Improving tx_extra does not need to be the silver bullet that fixes everything uniformity related, but it would help. IMO, we should explicitly leave in one field per transaction which would indirectly allow other arbitrary data to be verified through Monero PoW, but no more than that.

spirobel commented 1 year ago

+1. We should move towards transaction uniformity since fungibility has always been one of the, if not the most important, goals of Monero.

Ultimately it is about fungibility of coins and not of transactions. Transaction uniformity only matters because we need to select decoys and that means utxos are still somewhat connected to transactions.

https://github.com/monero-project/research-lab/issues/100 this issue is about trying to find a solution where utxos get completely disconnected from transactions.

It seems like @tevador has identified some caveat to this, but it is still better to investigate this caveat instead of trying to beat the dead horse of tx_extra removal or restrictions.

It will have a much bigger impact than this.

tevador commented 1 year ago

if you construct a payload and it happens to fail the randomness test, you have to change the message

Assuming you use a secure encryption method (IND-CPA), there must be a nonce, so the tx builder API can simply reroll the nonce until the test is passed. With a test confidence of 0.95, you can expect to have to encrypt 1.05x before you have a valid ciphertext. Note that a similar "reroll the nonce" process already exists when constructing the range proof.

IMO, we should explicitly leave in one field per transaction which would indirectly allow other arbitrary data to be verified through Monero PoW, but no more than that.

Ideally, the size of the "extra" field should not be arbitrary, but it could be a function of the number of outputs. For example 128 bytes for all 2-out transactions (this is enough to fit a refund address) and 32*num_outputs bytes otherwise. If the field is also encrypted, this would leak the least amount of information.

@spirobel I think most of us agree that implementing trustless zk-SNARKs would be great for Monero, but it's not something that can be done quickly. Feel free to continue the discussion there and let's focus on tx_extra here.

spirobel commented 1 year ago

@tevador

I think most of us agree that implementing trustless zk-SNARKs would be great for Monero, but it's not something that can be done quickly. Feel free to continue the discussion https://github.com/monero-project/research-lab/issues/100 and let's focus on tx_extra here.

Both of these issues are related. As @kayabaNerve mentioned earlier: a SNARKs PoC could be done in weeks and an actual implementation in months.

It seems currently that this kind of project is blocked by seraphis and its outcome.

We should make a competition between seraphis and a SNARK based protocol.

What do you think about that? It would also have the added benefit that it would reduce researcher bias during the Seraphis audits and we would have an alternative if Seraphis does not come through for whatever reason.

kayabaNerve commented 1 year ago

1) This conversation has gone off the rails.

2) I don't want to infinitely discuss Serai's architecture. While I don't mind briefly doing so, I'll keep my commentary short.

It's more efficient for storage overall

Was a comment that given the full payload and the minimal payload, the storage cost is F+M. With an additional hash on Monero, the cost is H+F+M, compared to Monero just having F and us having M. While there are discussions on if Monero could have H and H could be of M which we have, as the 'minimal' payload is only slightly more minimal in this context, the main issue is chain-specific post-processing we'd have to move on-chain which is hell.

Also, yes, you can say Monero shouldn't have any payloads and you're not an ass for doing so. I'm telling you the path of least resistance for me is to put this data on Monero and I'm going to. You can either:

A) Let me do this in the best way for all of us. That means a sane TX extra amenable to users who want to place data on Monero.

B) Accept people will just use steganography, which may already be preferable.

Personally, I already summarized what I think makes sense for TX extra, yet I legitimately wouldn't mind only having steganography. I also want to comment all my advocation for Monero has been what I legitimately believe is best for Monero, not for my own project. I just acknowledge that this feature is meant to be used so the question is what's legitimately useful, and I've advocated for it to be legitimately useful (or removed due to the vector it is).

3) > I don't think it needs to be that black and white.

If TX extra isn't the best way to store data on Monero, it has no reason to exist. The wallet protocol shouldn't be there, as universally agreed. If people aren't using it for arbitrary data, then no one is using it for anything except exploits due to our increased surface area of having it. While I won't say it must be the best way for every single use case and every person, I believe it not solving the data availability problems makes it far too infrequently the best way to store data on Monero. Since its only point is to be the method to store data on Monero, why keep it around then?

4) We should improve the protocol as we can. There are many issues with Monero and we should work to fix those. If you want a static protocol, go bother BSV or don't update your node. I don't care to halt discussion on the premise changes due to discussions are bad.

5) I don't want to comment on SNARKs, a completely unrelated topic, yet it keeps getting brought up. I call for everyone to stop discussing SNARKs in this discussion and if someone else brings it up, just ignore it.

spirobel commented 1 year ago

a completely unrelated topic, yet it keeps getting brought up. I

It is getting brought up because it is related to transaction uniformity. The reason we are worried about tx_extra in the first place. It would address the root cause of this whole debate.

kayabaNerve commented 1 year ago

No, it wouldn't. There are many different ways transactions can be non-uniform. The ring is a completely different topic from TX extra. While both can make TXs non-uniform, one does not affect the other.

spirobel commented 1 year ago

No, it wouldn't.

yes it would.

The basis of the privacy properties of Zcash is that when a note is spent, the spender only proves that some commitment for it had been revealed, without revealing which one. This implies that a spent note cannot be linked to the transaction in which it was created. That is, from an adversary’s point of view the set of possibilities for a given note input to a transaction —its note traceability set — includes all previous notes that the adversary does not control or know to have been spent.

This contrasts with other proposals for private payment systems, such as CoinJoin [Bitcoin-CoinJoin] or CryptoNote [vanSaberh2014], that are based on mixing of a limited number of transactions and that therefore have smaller note traceability sets.

from: https://zips.z.cash/protocol/protocol.pdf

You would still have to worry about some meta data leakage as @tevador pointed out. But there is still a fundamental difference in the impact that transaction uniformity has on privacy. It turns "everything could be a problem" into "some things could be a problem."

There is no point to just focus on tx_extra if we are unwilling to make changes so that there is no link between notes and transactions anymore. Then we need to focus on transaction uniformity in general and not just tx_extra.

UkoeHB commented 1 year ago

Transaction uniformity is not an all-or-nothing game, there are many incremental improvements that can and should be made as solutions present themselves.

LocalMonero commented 1 year ago

In addition to the tx uniformity argument, one more argument in favor of removing the tx_extra field is that a dynamic (i.e. potentially infinite) block size that Monero has combined with an arbitrary field leads to Monero becoming a vector for uses that aren't money, meaning it's going to be less efficient at what its purpose is: being next-gen money.

paulshapiro commented 1 year ago

Okay but what exactly is "money"?

Like ukoe says there are increments.

We just need to make sure we're not making monero useless at the expense of an ideal.

monero-project / monero

[Discussion] Consider removing the tx_extra field #6668