monero-project / monero

Monero: the secure, private, untraceable cryptocurrency
https://getmonero.org
Other
8.85k stars 3.09k forks source link

[Discussion] Raising the mandatory ringsize in the v6 hardfork, September 2017 #1673

Closed olarks closed 6 years ago

olarks commented 7 years ago

TL;DR There is an edit at the bottom of this post highlighting what is going on because a lot has changed over many discussions, and better alternatives are being explored.

You may want to grab a drink for this one...

As we all know there will be a hardfork in September 2017 that will raise the mandatory ringsize to 4 based on the recommendations of MRL-4, but this paper was published before RingCT was found to have extraordinary savings in transaction size for transactions with larger ringsizes.

We have 7 months to explore alternatives and I would like to discuss some viable options for raising the mandatory ringsize for the v5 hardfork, so that we do not need to again change it in a future hardfork unless certain circumstances arise.

It is important to note that raising the ringsize of a transaction will increase the time it takes to verify transactions for nodes, but as far as I am aware this scales linearly(?). So this can be maintained to a reasonable degree and if we assume processor speeds continue to improve, then this problem does not greatly impact the scalability of the Monero network in comparison to the relative gains of privacy for its users. Optimizations in ring signatures and rangeproofs will certainly ease these concerns.

There are two options I would like to explore:

1) Simply raising the mandatory ringsize to a value greater than 4 that can take full advantage of RingCT's optimizations in transaction size without sacrificing very much performance. Ring size 8 could for example easily be used instead of 4 and still have very good performance for example.

and the more interesting option

2) Raising the mandatory ringsize and have it be static so no other ringsizes are accepted in the Monero network. Static ringsizes 9, 12, or 15 would be good options and I will explain a bit more about why those ringsizes specifically are special later below.

The first option would be simple to implement and the optimizations of RingCT would be immediately realized with very little added cost. However the second option will bolster privacy on the Monero network by a much higher degree. By enforcing only one valid ringsize for transactions this removes any confusion and complexity for new users who do not understand why you would want to increase the ringsize of a transaction if transactions in Monero are all already private. Since the protocol would be enforcing a static ringsize, then all transactions would be homogenous and would circumvent analysis of transactions by adversaries who may know that for example exchange x uses ringsize y for all their outgoing transactions, or that there is a user that regularly specifies atypical ringsizes like 41 for all their transactions. This removes a human element to transactions that could save people from unintentionally reducing their privacy as well as other users' privacy.

In the future, businesses and services could be 'assigned' very specific ringsizes that would stick out on the blockchain for passive observers to more easily identify their outputs in other ring signatures, this is only hypothetical and one of many possible scenarios where irregular ringsizes could reduce privacy in the Monero network.

I believe that this is a very valuable and natural addition to Monero since the protocol already makes decisions to protect its users like dynamic fees and randomly choosing their ring partners according to specific parameters. A static ringsize would just further streamline the user experience while defending against many attacks via passive analysis of ring signatures.

In addition to the arguments I made for a static ringsize I will also propose a new output selection algorithm to assist a static ringsize. Going back to why I had specifically mentioned static ringsizes 9, 12, and 15 is because if a third of ring partners are chosen randomly from the past five days and the remaining two thirds of ring partners are chosen at random then a few narrow edge cases that would threaten privacy would be completely eliminated or at the very least much less impactful than only having a default ringsize 4.

I will demonstrate these claims with some hypothetical scenarios using a ringsize of 12 as an example.

Scenario 1: Alice sends Bob a transaction with an output that is two days old. In Alice's ring signature are four inputs from the past five days chosen by our output selection algorithm(plus Alice's real spend) and eight additional inputs chosen at random.

Alice has complete financial privacy as her real output is masked with the other 12 ring partners without any major inconsistencies in our output selection algorithm. Since there is a chance at least one of those eight outputs chosen at random could have been from the last five days the additional recent output in the ring signature is not entirely suspicious. This will still be taken into account by passive observers since a lot of people will be spending money soon after it has been received. Possible scenarios where this would definitely question the legitimacy of randomly chosen outputs and directly threaten privacy is discussed in more depth later.

Scenario 2: Same as Scenario 1 but instead Alice has an output that is one month old. Our output selection algorithm will again choose four random outputs from the past five days and eight outputs at random.

Now the problem is Alice has an output that is older than the outputs that would be specified from the last five days and if the other eight random outputs do not coincide with the last five days then only four recent outputs will exist in the ring signature. This is a direct indication that none of those outputs is the real spent output because only four recent outputs would be chosen from our algorithm anyway, hence being fake. Luckily the other eight random outputs still mask Alice's real output so this scenario does not greatly threaten her privacy as the transaction would effectively still have the strength of ringsize 8 down from 12.

Scenario 2 is less likely to occur if a larger timespan is chosen for recent outputs in our algorithm so the randomly chosen outputs have a higher chance to be in that timespan, but since money is typically spent within a short time after being received this will directly weaken the obfuscation of our recent outputs selected for a ring signature.

The output inconsistencies in Scenario 2 cannot be solved very well without adding additional formality to the output selection algorithm making it easier to single out specific outputs that are not consistent with our output selection rules like in Scenario 2. With a ringsize of 12 at the very least an effective strength of ringsize 8 would still exist for a transaction.

However if an observer were able to find inconsistencies within a transaction with multiple inputs, all or at least some, containing more than four outputs younger than five days old it is incredibly unlikely the other eight random outputs for each input's ring signature would contain very recent outputs. This would question the legitimacy of the eight other outputs older than five days giving the transaction a hypothetical strength equal to the number of outputs in the ring signature younger than five days old or at the absolute bare minimum the strength of ringsize 4.

These scenarios will need to be accepted to exist and thankfully do not completely destroy the privacy of a transaction with poorly chosen ring partners by the output selection algorithm.

Increasing the static ringsize would decrease the likelihood of these contradictions in ring signatures, but it is a tradeoff with transaction sizes. Ideally most transactions would be able to take full advantage of ringsize 12, but the privacy of ringsizes 4 and 8 can still be obtained in worst case scenarios.

I personally prefer ringsize 12 because at the bare minimum when the output selection is not in your favour then at the very least ringsize 4 can still be obtained, which is the original ringsize scheduled for v5 hardfork. Ultimately any ringsize greater than 9 that is a multiple of 3 can be used and if we take into account optimizations of ring signatures and rangeproofs then even larger ringsizes, factors of 3s, could be considered.

The privacy that would be gained with a static ringsize 12 far outweighs any short term scalability issues that may arise in my opinion and would be a boon to the privacy and security of the Monero network while having good safe guards for maintaining a minimum privacy level of ringsize 4.

I hope valuable discussions can be made from this post and further optimizations that can further the security of the network can be discovered.

Thanks for taking the time to read my writeup. Any input is greatly appreciated.

EDIT: If anyone is just now finding this discussion and does not want to go through the entire backlog(I don't blame you ;) ) then to catch everyone up to speed what is happening right now is I have been surveying bitcoin transactions for the age of outputs and categorizing them by the last day, week, month, year, and older than a year to get the probability of these outputs being spent to help construct an output selection algorithm for Monero's ring signatures.

After long discussions with @iamsmooth, having only one static ringsize may be a bit extreme and does not give a user the freedom to increase their ringsize for transactions. @moneromooo-monero mentioned possibly having three static ringsizes for users to choose from. I have been favoring having only three static ringsizes because of its similarity to users being able to choose three different fee options based on urgency, we could have three ringsize options based on paranoia for users to choose from.

So for the time being we are just trying to narrow down what an output selection algorithm would look like. I have to thank @iamsmooth for giving critical feedback and nitpicking different ideas. There is no need to be increasing the ringsize from 4 after we come to a conclusion on a suitable output selection, however I still favor giving the mandatory ringsize a slight bump(to 6?) based on ringct savings in transaction size, but if there is no consensus for doing so ringsize 4 is still a strong option.

ghost commented 7 years ago

This is a great topic to highlight. To my knowledge, we haven't really had a decent post-RingCT discussion of ringsize. I assume keeping the size of the database smaller rather than bigger will be a part of this discussion, but personally I like your thinking, particularly the idea of a static ringsize.

ghost commented 7 years ago

Is this inspired by the latest work by @knaccc describing the churn required for true anonymity in ring sets?

olarks commented 7 years ago

@NanoAkron I have not been around the last few weeks, so I must have missed that discussion. This writeup was inspired after looking deeper at the ring signatures and output selection algorithms. Though I am not aware of anything else that could fix these problems without setting a suitable static ringsize 12 like I discuss above. The problems are just inherent to the output selection algorithm and taking those flaws into account with a ringsize that can still protect users.

RandomRun commented 7 years ago

Hey, I am glad to see input selection and ringsize value being put under the microscope to fix some information leaks that still exist. Indeed, if one looks at the high ringsize transactions, it is possible to trace back the user's inputs with some probability of success, assuming that the user is "the high ringsize guy" that uses all the highest values in each ring.

I think that @olarks approach is in the right direction, but I'd like to make a few suggestions. But before moving on to those, I just wanted to note that there are two different issues at play here: (i) inputs age correlation, and (ii) their ringsize values correlation, both of which may leak information and help with traceability.

Inputs age:

If we knew the actual probability distribution for the ages of the inputs actually consumed on each new ring, then we could just sample that distribution as much as needed to build new transactions. AFAIK, we don't have that distribution yet, but it would be a worthy endeavour to try to obtain it by looking at the blockchain of some non-private coins, like Bitcoin. I bet that if some one did that, we would be looking at something that resembles a Zipf distribution, or otherwise some form of power law distribution. In that regard I think that the distribution suggested in the OP might not be the best fit for that, since it is the superposition of two uniform distributions.

If we assume for the moment that the actual distribution is a Zipf. Then all that the wallet has to do is pick a day according to it, and then, within that day, just choose one input uniformly at random. Repeat this process untill you get enough inputs to complete your ring. Add your own input to it, which by definition can be viewed as picked according to the actual distribution, and you are done.

[As a side note, if it turns out that that "the actual distribution" and hence the one we use to sample the inputs does follow a power law and has a long tail, it might make the blockchain a bit more prunable, in the sense that over time it would be very rare to see a transaction involving a very old input, real or decoy. And so, maybe not all nodes will have to keep the blockchain history older than say 5 years (although of course all the inputs would still have to exist somewhere). This would not be possible if we keep sampling inputs according to a uniform distribution, though.]

Ringsize:

Monero has a good level of privacy, and that comes in part from the size of your anonymity set in each transaction, which in this case is the ring size, so the bigger the better. I agree that we should make it easier for the user not to do something stupid everywhere we can, and that includes ringsize selection; and I like the idea of increasing the minimum ringsize way beyond 4, since that doesn't seem to be a bottleneck, at least in the forseeable future. But on the other hand, I would prefer it to be done by way of selecting good default options rather than banning some ringsize values (except for values under the minimum, that is good of course).

So instead of simply defaulting the wallets to set ringsize to 4 or 8, why not have the default be a random value as well? Perhaps let it be a uniform random value between 8 and 50, or a triangular distribution from 8 all the way down to 100, or even another power law. I don't have a completely well formed opinion about the choice of distribution for the ringsize, except that I really think it should be random by default. That way, if a user does decide to use a higher ringsize, hopefully they won't stand out as much as right now, when it is prety much all ringsize 2 and 4...


@NanoAkron: Could you please link to @knaccc's work that ou referenced?

hyc commented 7 years ago

@knaccc's latest draft is here http://termbin.com/kplo

JohnnyMnemonic22 commented 7 years ago

As a possible solution to the scenario 2 issue, instead of the number of selected 5-day-old ring partners being fixed to ringsize/3, why not make it a random number between 0 and ringsize-1? Wouldn't that eliminate the possibility of clever deductions based on input age?

kenshi84 commented 7 years ago

@RandomRun

So instead of simply defaulting the wallets to set ringsize to 4 or 8, why not have the default be a random value as well? Perhaps let it be a uniform random value between 8 and 50, or a triangular distribution from 8 all the way down to 100, or even another power law.

But the tx fee is proportional to the ringsize. For example, currently the fee for a 2-in/2-out transaction with ringsize of 8/50/100 would be 0.024/0.033/0.045, respectively. So I can imagine users will cancel and repeat the tx construction algorithm until the algorithm picks a smaller ringsize to reduce the fee. I think that was the point of the enforced minimum mixin.

olarks commented 7 years ago

@JohnnyMnemonic22 The main problem with a randomly variable output distribution is now your minimum privacy guarantee is random too. While this could make scenario 2 less likely to occur the minimum privacy level being guaranteed could be lower than ringsize 4, in some scenarios, which I think is undesirable. Having a concrete minimum ringsize that can be accepted to fall back on, like ringsize 4, when randomly chosen outputs are unfavorable for a user is the better alternative imo.

@RandomRun Zipf distribution or other power laws could be possible, but this would add an additional uniform pattern to how outputs are chosen wouldn't it? Any output that does not fit the uniformity of the power law would likely to be the real spent output right?

The reason why I chose the output selection to have a completely random distribution, within two seperate timeframes, is because it is less likely to be successfully scrutinized by an attacker sorting through randomness. By having the number of outputs from the last five days be ringsize/3. this can provide a 'minimum' functioning ringsize to ensure that even if our random output selection are poorly chosen we have a safety guard in place to still provide good privacy.

I had also pondered having a random ringsize be chosen for every transaction within a defined range, but this won't be enforced in all wallet software. The network would still allow any ringsize in our chosen range to be used which means not all transactions would be using random ringsizes, rather favoring the minimum ringsize to save on transaction fees like @kenshi84 points out.

I think having a static ringsize would be invaluable for the level of homogeneity it would provide because the more transactions stand out from eachother the more likely passive analysis can make a connection among them.

Thanks for your input @RandomRun

Gingeropolous commented 7 years ago

@RandomRun @olarks , I'm digging the higher fixed ringsize. And hopefully we're all agreed that its ringsize now . I've had a longstanding concern over inputs age.

@iamsmooth posted something recently in IRC and I dunno if it was followed up, but it sounds like its related, so I'd thought I'd post, and I hope smooth doesn't mind that I copy and pasted his words.

Jan 30 11:04:30 <smooth>        to pick some random recent transaction, pick a random input, and then use that (approx) age to pick your fake out
Jan 30 11:04:58 <smooth>        repeat until done
Jan 30 11:06:21 <smooth>        this will converge to fake outs having the same distribution as real outs, even though real outs cant be identified. statistical magic
Jan 30 11:10:53 <smooth>        there might be some reasonable refinements like picking a random recent transaction of the same or similar shape (in terms of number of inputs and outputs)
Jan 30 11:11:12 <smooth>        at the cost of needing even more volume for that to be reasonable
"

what do you guys think?

iamsmooth commented 7 years ago

The discussion of a single fixed ring size is worth having, from the perspective of increasing homogeneity, and therefore offering a potential system-wide benefit. I'm generally not a fan overall of proposals to blanket increase the minimum ring size because of the accompanying increase in tx size, cost, and therefore likelihood of reduced usage (which in turn carries its own cost in reduced privacy in practice in addition to probably increased likelihood that Monero fails to reach a critical mass of adoption quickly enough or at all, and fails entirely).

It is certainly usually true that one can improve the privacy of individual transactions in various ways by increasing the ring size, but this is different in kind from the sorts of catastrophic cascading system-wide failures/attacks that motivated the MRL-0004 recommendations. The bar to mandating increased costs is higher when it is a more direct quality/cost trade-off and not a system-wide failure mode or benefit (acknowledging the potential that a fixed size might carry such system-wide benefits).

I expect there will be a variety of wallets and tools built on top of the Monero protocol that will improve privacy at higher cost or with different tradeoffs (such as delayed availability of funds) by managing transactions in various useful ways (and likewise some that will minimize costs within the protocol rules for users who care more about that). Those don't all need to be built into the Layer 1 core protocol. Unless there is demonstrated systemic effect, I'm reluctant to (attempt to) impose a more expensive one-size-fits-all solution upon everyone.

olarks commented 7 years ago

Another concern is as the TXO set ages, then each randomly chosen output is less effective because in the future if four year old outputs are being selected for ring signatures it is likely those aren't the real outputs being spent. At the same time you don't want to be only favoring newer outputs for selection because then spending older outputs will stick out and be suspect for the real output being spent.

There would need to be in place some kind of schedule for increasing the mandatory ringsize every x years to offset this problem, so even though old outputs are being selected there is enough ring partners not to look suspicious in the ring signature. Hopefully by then we will have good scaling solutions, so this increasing bound does not bog down the network.

hyc commented 7 years ago

How about: for ring size N, we pick N/2 outputs around the same age as the real one, and N/2 outputs clustered around a randomly chosen one?

olarks commented 7 years ago

@hyc Overall I like your idea, but the group of outputs with the real spend would stand out by having (N / 2) + 1(real spend) outputs. So every transaction's effective ringsize would be cut in half immediately.

hyc commented 7 years ago

Assuming this approach is worth pursuing, then we can (1) make sure that the total number of outputs N is always even, so the real spend group is (N/2)-1 random outs + the real out, and the other group is N/2 random outs. Or, (2) if the total number of outputs is odd, randomly choose whether the real group or the random group gets the odd random out.

olarks commented 7 years ago

@hyc To expand on your idea, the outputs could be selected based on whether or not the spending output is in the last five days. We could use a static ringsize 10 to ensure that both groups of outputs have the strength of ringsize 5(4 ring partners + 1(potential spend)).

If the spend is in the last five days then four additional outputs are chosen at random in that timespan. For selecting the last five outputs then you could randomly choose one output in the TXO set and have it be a base point for randomly choosing four additional outputs within five days of the base output.

If the spend is older than five days then it is pretty much the same situation, but instead of choosing another random base point you would just randomly choose five outputs from the past five days. So in either scenario it is not obvious if the real spend is in the last five days or in the random group of outputs because both situations would be occurring in every ring signature regardless of the age of the real spending output.

A major hole in this though is the spend output only being a few days older than the randomly selected outputs from the past five days. What could occur is all the randomly chosen outputs within five days of the spend output all happen to also be less than five days old which would directly expose the real spend output since it would be the only output in the ring signature older than five days.

Choosing the extra four outputs randomly from the TXO set does not solve this problem very well either because now spend outputs that are only a bit older than five days look out of place in the ring signature because they are unlikely to be randomly chosen very often, and now you risk having the five outputs from the past five days being discarded as potential spend outputs. Giving the transaction effectively only a ringsize 4 with a threat of unmasking the spend output a little older than five days because it is unlikely to be selected often from four outputs in the TXO.

This goes back to my original proposal with static ringsize 12, but in the same scenario only four outputs would be discarded while giving a much higher level of plausible deniability of the real spend output because you get 8 random outputs being selected from the selection algorithm instead of 4, so the suspect spend output is scrutinized to a much lesser degree while only being slightly larger in ringsize 12 up from 10 for this selection scheme.

JamesCullum commented 7 years ago

I think in this discussion we should also keep in mind that a higher ring size would lead to a bigger transaction and therefore higher fees. We want to make sure that people are using it for normal transactions which don't necessarily require perfect privacy and are just not supposed to not threaten the legitimacy of the whole network, which is why MRL4 proposed the minimum ringsize. I agree with @iamsmooth that a higher ringsize could be suggested for people seeking better privacy, maybe including a different output picking algorithm, but increasing the ringsize this dramatically will have more downsides on the economic part than benefits from the privacy part.

Nobody wants to use a currency where the transaction fees are a major cost. If I want to buy a coffee with a cryptocurrency, I won't choose the one where I pay lots of money to make sure nobody knows I bought the coffee. If coffee is illegal in this country I may be willing to do so, hence I would increase the ringsize myself.

Mandatory ringsize of 4? Yes. Mandatory ringsize of >4? No.

hyc commented 7 years ago

@JamesCullum Your concern doesn't seem valid. The txn size (and thus the fee) is dominated by the number of outputs, not by the ringsize. http://monero.stackexchange.com/questions/3323/with-ringct-on-can-i-lower-mixin-safely-to-save-on-tx-fees

hyc commented 7 years ago

@olarks I have no problem with your suggestion of ringsize 12 in this context.

olarks commented 7 years ago

@JamesCullum Thanks for your input. The cost of increasing the ringsize with RingCT transactions is very overstated and I did also mention that MRL-4 did not take into account RingCT savings because that paper was written much before RIngCT existed, hence why I want to discuss increasing the ringsize now.

This proposal is to make sure that at the bare minimum a transaction gets the security of ringsize 4 when under heavy analysis. Ringsize 12 is just the cost of being able to strongly maintain this minimum security.

Your point regarding coffee is not really valid here because I am trying to make sure that when you do buy that coffee your money is still fungible. We would not want you to be getting investigated by authorities because your money had been used in the past by someone else for 'suspicious' things and now you get framed for it. This is what currently is plaguing Bitcoin for ever being used in the real world as a digital cash.

The results of this discussion and proposal I made will allow Monero to resist strong analysis of ring signatures and allow it to be real digital cash.

JamesCullum commented 7 years ago

Ups, sorry then. If I recall correctly, before RingCT the ring size was a bigger influence, hence my statement to keep the fees low. If we can do that while at the same time having more privacy, we should do this.

Of course it is important to stay fungible, but it is already at the moment and a transaction age correlation would still give sufficient plausible deniability, meaning that a filter to block tainted coins would hit high false negatives. So I think we are currently (maybe not for long though, hence a higher ringsize than 2) fungible and should work on maintaining that. 12 may be a little high, but if it doesn't cost much more its alright.

RandomRun commented 7 years ago

@kenshi84: I can see your point about users choosing to minimize their fees and therefore the ringsize of most transactions converging to the minimum alowed, as it already happens and motivated OP's suggestion to fix the ringsize. I guess if that is the way fees are computed we can't escape the scenario with only minimum ringsizes are used. Unless we change how fees are calculated to subsidize higher ringsize transactions (something like: flat fee of 0.033 for all transactions not bigger than the 2-in/2-out ringsize 100 you used as an example). But this is more me thinking out loud about your comment than giving a concrete suggestion. I actually don't fully understand how fees are calculated, or even why they are "fixed", as opposed to their price being defined by market forces on the network.

@olarks:

Zipf distribution or other power laws could be possible, but this would add an additional uniform pattern to how outputs are chosen wouldn't it?

I don't see how it would add any information. Zipf distribution or other power laws are biased distributions, and therefore not uniform, which is the reason why I was suggesting to use them, as I believe this is the way real outputs ages behave.

Any output that does not fit the uniformity of the power law would likely to be the real spent output right?

An output is just a possible outcome from the distribution. I believe it doesn't make sense to say that it doesn't fit its distribution. At best you could say that it is a rare outcome, but that is fine, as it would be rare both when it happens through a real spend, or through sampling.

The reason why I chose the output selection to have a completely random distribution

All distributions suggested on this thread are completely random. I believe you are using uniform and random interchangeably, but those are not the same thing. In fact, my main point has been that the actual distribution of real spent outputs is a biased distribution, and therefore if we use a uniform distribution to create our rings, that will leak information.

Let me exagerate the scale, and simplify your suggestion a bit, to hopefully make this more visible:

Let's say outputs ages are in fact a Zipf distribution, and that we are producing our rings sampling outputs from the blockchain uniformly at random. Assume we have ten years of blockchain history and that users are choosing to produce a rings of size 10.

The rings produced with this uniform sampler wouldn't be too far from say having one output from each year, roughly. But as a user, by assumption, you would be very likely spending a very recent output, and it would be very unlikely that one of those 10 outputs chosen over the course of a decade would be more recent than yours. So an attacker could guess that your real input is the most recent one, and be correct with high probability.

Now you may shrink the interval to 5 days, but the effect is still there, and in the end what you are suggesting is overlaying two samples obtained from two uniform distributions that suffer from the same attack just described.

At the same time you don't want to be only favoring newer outputs for selection because then spending older outputs will stick out and be suspect for the real output being spent.

They won't be suspect because it would be both rare occurences to see them being actually spent, or randomly selected as a decoy, since both events would be occuring according to the same distribution. You might feel exposed seeing your output alone as the oldest one in the ring, but that is only because you know a priori that that is your output. To everybody else, it will look just like any other ring in any other transactions, all with more more recent outputs than older ones.

The main point I am trying to make is that within each ring, all outputs should have equal probability of being the real one, and that doesn't happen if you are using a uniform distribution to mask events that are modeled by a non-unifrom distribution.

@Gingeropolous @iamsmooth: The problem with using outputs that are close in time to the real one is that it leaks information about the time of the real output being spent. And worse, the higher the number of decoys chosen according to that method, the better the estimation of that piece of information.

olarks commented 7 years ago

@RandomRun Ok, I understand what you mean a lot better now. I was getting caught up in semantics. We could try surveying Bitcoin, maybe Litecoin and Dogecoin too since they have good histories going back multiple years and still get good transaction volume to this day. Hopefully some insight can be gained from what their zipf distributions look like and hopefully can be translated into an output selection algorithm Monero can take advantage of.

luigi1111 commented 7 years ago

BTW, (I haven't read the entire thread) ring size in RingCT actually scales identically to pre-RCT. It scales very slightly worse with number of inputs (this could be remedied if there was strong need, which I do not believe is the case).

Now, as a % of base transaction size, it is of course smaller. It can still balloon with high input count with high ring size (using the two together is dubious anyway, due to likely real input correlation).

kenshi84 commented 7 years ago

I also wanted to bring up the subject about temporal alignments of ring members across different rings, for example:

If this kind of temporal alignment happens in a tx with many inputs, and especially if it happens for the ring members in the older time, then it becomes highly probable that those are the real spent outputs. knaccc expressed a similar concern on IRC. A possible remedy might be to artificially increase the chance of temporal alignments for even older outputs so that this kind of event is not too rare. I have no idea how to achieve that, though.

Edit: A similar question already asked on StackExchange

kenshi84 commented 7 years ago

@JamesCullum

We want to make sure that people are using it for normal transactions which don't necessarily require perfect privacy

I think your argument goes against the original idea of Monero, i.e., there's no such thing as 'perfect privacy' vs 'moderate privacy'. If the strongest possible privacy feature is not enforced to all participants by default at the protocol level, there's actually no privacy at all; look at Dash/Zcash etc. A small group of paranoid users choosing some 'more private' transaction method at higher cost won't help, because such transactions will stand out in the blockchain which would be easy to analyze. The more uniform transactions, the better privacy.

iamsmooth commented 7 years ago

If the strongest possible privacy feature is not enforced to all participants by default at the protocol level, there's actually no privacy at all

This is not the original idea of Monero at all. The changes to apply a mandatory minimum ring size were made to address specific issues identified in MRL-0001 and MRL-0004 where individual users choosing lower ring size could severely compromise the privacy of other users, and make certain Sybil attack vectors much cheaper for an attacker. That justified imposing higher costs, because the alternative meant severe damage or vulnerability to the system as a whole.

There is no such case being strongly made here, at least nowhere near the degree to which the case was made in MRL-0001 and MRL-0004,

A small group of paranoid users choosing some 'more private' transaction method at higher cost won't help, because such transactions will stand out in the blockchain which would be easy to analyze

This needs further study and characterization to better clarify what is meant by "won't help". It certainly does help in the sense of increasing the immediate anonymity set on a particular transaction, including the strength of plausibility deniability of having spent a particular output. This carries tradeoffs in terms of standing out from the crowd, but it is not clear (at least to me) without more rigorous analysis how or when these tradeoffs might apply, nor whether this imposes damage or vulnerability onto other users of the system (if a particular user finds the tradeoffs of using a larger ring size to be unattractive, that user can choose not to do so).

iamsmooth commented 7 years ago

@kenshi84

I also wanted to bring up the subject about temporal alignments of ring members across different rings

That should probably be made into a different issue. Simply increasing the ring size a little won't solve that problem, at least not entirely.

Time alignment is somewhat addressed in MRL-0004 and the recommended solution is largely to avoid or limit how related outputs are combined and instead send multiple transactions, but that has some costs, including potentially increased costs from RingCT.

iamsmooth commented 7 years ago

@RandomRun

@Gingeropolous @iamsmooth: The problem with using outputs that are close in time to the real one ...

That wasn't my suggestion. Maybe you meant to direct that at someone else though, I think @hyc made a suggestion something like that

iamsmooth commented 7 years ago

@hyc

The txn size (and thus the fee) is dominated by the number of outputs, not by the ringsize.

It isn't always dominated by the number of outputs. As @luigi1111 mentioned a few replies back, the signatures can still balloon up and exceed the size of the range proofs if the ring size and number of inputs is both large (since they scale with n*m), assuming of course that the number of outputs isn't also large.

A large number of inputs is less common with RingCT but currently it still does happen (I've seen txs with >100 inputs recently). There might be other reasons to discourage that though.

kenshi84 commented 7 years ago

@iamsmooth

This is not the original idea of Monero at all.

Well, maybe I just took the idea wrong and am being an extremist. But I do get a puzzled feeling when users are currently allowed to use Monero with different degrees of privacy by changing ringsize. I saw someone on IRC (forgot who) questioning: "Oh, isn't it always private when I use Monero?" There have been quite some questions asked about the degree of privacy w.r.t. ringsize, e.g.:

So at least this seems like a common point of question and possibly a source of confusion. Also, users might even damage their privacy by choosing high mixin inappropriately.

I really wonder: does using higher mixin really provide more privacy than using default mixin (currently 4, possibly higher in the future)? If so, does that mean users using the default mixin are somehow risking their privacy?

I wish to be able to tell people simply that "Your privacy is well protected if you're using Monero", instead of "Your privacy is protected very well/so so if you configure your Monero wallet in this/that manner". In my opinion, MRL should continuously conduct studies on the level of possible Sybil attacks and blockchain analysis at present time to come up with a particular ringsize that should provide good enough privacy for everyone, and enforce it at the protocol level on every hardfork.

The community decided to accept the increased cost of RingCT favoring the improved privacy, despite some users complaining. Likewise, shouldn't we accept the increased cost for the fixed higher ringsize to ensure high enough privacy?

Those who want to sacrifice their privacy to save fees may switch to other coins. Or alternatively, some Lightning Network built on top of Monero that offers lower fees at the cost of privacy may be able to satisfy such demand in the future.

kenshi84 commented 7 years ago

Maybe I'm being just too naive/opinionated and not seeing the big picture. Please understand that my intention is not to divide or troll this community; Monero's privacy functions only if its user base and transaction volume is large enough, after all. Probably I should carefully read MRL-0001 & MRL-0004 once again.

anonimal commented 7 years ago

He who from the Rhinegold fashioned the ring that would confer on him immensurable might could win the world's wealth for his own.

One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them.

JamesCullum commented 7 years ago

@kenshi84 Its a ring size. If you use 2, you can be either one of them and there is no way of really telling but if one of them is evil and tries to spam the network to reveal his outputs, his share is degrading over time and becomes unfeasible - this is why there is a network minimum. A minimum of 4 can do this even faster. This is the reason why Monero chose a minimum, not because that's perfect privacy. After all, with a ring size of N there is a 1/N chance of guessing correctly who you are, with good analysis (and maybe opsec mistakes) this can be drastically increased. If your brother had to choose between you and an evil robot imitating you, having 50% (ringsize 2) or 10% (ringsize 10) can be a big difference on your level of privacy.

So do we really want to force a better privacy even though this doesn't have anything to do with the reason for the minimum? Shouldn't it be the users choice to choose a balance between costs and privacy? This is why you can choose a ringsize and there is not one level for everybody. If we can increase it from 4 to N without more costs or bloat, its an obvious yes. Otherwise, this risk may not be worth it, like my example of buying a coffee or trading from/to an exchange that knows you anyways.

RandomRun commented 7 years ago

@iamsmooth: I was referring to the quote in this thread atributed to you by @Gingeropolous. I apologise if I misread what you meant, but if that was indeed you being quoted, could you please clarify what you meant by the "statistical magic"?


IMO the problem of temporal alignments of ring members across different rings brought up by @kenshi84 is a very serious one, and perhaps the solution is not that hard. @kenshi84 has already included code in the wallet that warns users when this issue arises, so how about just making it so that if the user has two or more outputs that are close in time, the wallet, if possible, simply picks at most one of them for a given transaction, otherwise warns the user about the issue? In fact, if the wallet notices that there are two such outputs, then it might prioritize spending just one of them in the first chance it gets just to avoid running out of options in the future.

If the user does get to a point where all their outputs are close in time, then perhaps they should be allowed/encouraged to spend (to themselves) one at a time. AFAIK there is no option yet for a user to send just one selected output entirely to himself, but it might be handy in these situations.

I realize that when @kenshi84 suggested giving users more say one output selection, he was met with some resistance on the basis that users messing up with output selection might invite more problems, perhaps of the likes of what we are seeing with ringsize selection, but if that is done in the wallet background, I believe this shouldn't be a problem.

IIRC there was also an objection aong the lines of "not allowing for outputs too close in time might reduce the randomness of the output selection algorithm", which I guess is technically true, but given that a time collision is overwhelmingly more likely to occur because they belong to the true sender than otherwise, I think avoiding these collisions in the transaction building algorithm might be a better way to go. Actually, since if this gets implemented only for the true spent outputs, an attacker would be able to deduce that any collision is therefore necessarily made of decoys, maybe it would be a good idea to make things more homogeneous and even not allow for decoy coillisions as well.

t4777sd commented 7 years ago

In my view, ever-increasing ring sizes is not the solution. This is not a discussion where simply increasing a value leads to more privacy. You increase this value and it may have the opposite effect where privacy is reduced by raising ring size. In this way, the correct value to use is non-obvious.

The best way for Monero to obtain privacy is to drive adoption. Not just in users, but in number of transactions. If ringsize is too high, then it will deter adoption and without adoption, monero has no utility. What is the point of having a fungible coin with a utility of 0? And, how private is monero when the population size is only a few thousand people? Certainly, even the blackbox solutions in more popular transparent chains could be more private than monero (given the higher adoption rate and hence higher # of people who use the blackbox solutions versus monero).

The existing ring system already has a lot of issues. For example, governments can create a lot of dust transactions on the network. When generating rings users will include these addresses when selecting what other addresses to use. And, this can be used by the attacker to reduce the effective size of the ring. The same effect occurs as people publicize their addresses (either willingly or they are forced to to meet compliance regimes, which erode privacy of the whole network).

The solution might be to increase the ring size, but you do that and you just decreased the number of users and transactions on the network. Few are going to pay all the costs that super-super obfuscated chain requires to purchase coffee. So, the result will be, those transactions disappear from Monero which is sad because those are the transactions that help provide anonymity for everyone else.

So, in my view, the solution is instead to offer "pretty good" privacy on-chain that is econmical, by default. A level of privacy that still allows a large number of people to use the coin (which increases absolute privacy offered) and still provide a lot of deniability. A ring-size of 4 does this IMO.

For those people that need more than coffee-purchase level of obfuscation, the best solution to them would probably be to churn transactions. Or, alternatively, they can use higher ring sizes for those specific transactions.

iamsmooth commented 7 years ago

@ RandomRun

My suggestion (quoted by @Gingeropolous) was to use the distribution of ages of outputs spent by inputs of recent transactions (say from the most recent month) as a model distribution for choosing random fake outputs. This does not suggest using outputs similar in time to your own real output at all.

What it does do is include a distribution of unknown real outputs in the fake output selection since each candidate model transaction input necessarily includes a real output, as well as several fake ones. If everyone uses this algorithm, and there is no apparent reason why they shouldn't, then everyone's fake outputs will be chosen with ages from a distribution that matches the distribution of ages of real outputs, even though there is no other way to direct measure the distribution of ages of real outputs. That is the 'statistical magic' part, and it is the gold standard of minimizing statistical inferences from the fake output selection being biased relative to real output usage.

The vulnerability I see in this is someone spamming the network with bogus transactions using a deliberately poor distribution of fake outputs to pollute the distribution used by the above method. As @t4777sd commented, this really needs to be addressed by maximizing legitimate usage (and I agree in principle with his comments on that point). Indeed someone performing this sort of spam/Sybil attack is already compromising privacy to a large degree, though in neither case is the attack fully effective, it is more of a degradation.

a time collision is overwhelmingly more likely to occur because they belong to the true sender than otherwise

This is not entirely clear. When there are many transactions being performed there will be many (at least approximate) time collisions that randomly occur. This assumes that the wallet avoids the worst failure modes, which it already does, such as using multiple outputs from the same transaction. A certain number of real collisions become largely if not entirely indistinguishable from the background noise, or at a minimum still provide plausible deniability (there is no way to know, with certainty or necessarily high confidence that a candidate collision is actually real, even if you might suspect this). To the extent that they don't, @t4777sd is also correct that those requiring absolute iron-clad protection against any possible inferences or statistical bias will need to take stronger measures at higher cost and at their own expense. Many people do not and will not need that (for example to prevent your landlord from easily seeing that you got a pay raise and raising your rent as a result).

olarks commented 7 years ago

@t4777sd I agree there needs to be a balance. Increasing the ringsize does not immediately mean there are less users and transactions taking place, it has a bell curve distribution. My intentions are for all transactions to obtain a minimum ringsize of 4 for plausible deniability, however there are many situations where having a ringsize 4 or even higher does not obtain our goal, as explained extensively in this thread.

The ringsize and output selection go hand in hand because the output selection algorithm is not a secret and if the algorithm is predictable then you are making an attacker's job very easy, hence why my OP favors a largely random output selection which can only realistically be used by having a larger ringsize to increase obfuscation.

Monero touts being a private currency, but most users do not know how to evaluate ringsizes. The user should not burden these decisions when the protocol by design is supposed to protect them. If a coffee transaction does not have strong privacy then many other types of transactions in the network also don't have strong privacy. I don't think this is acceptable regardless of how "unimportant" a transaction is. All transactions in Monero are supposed to have strong privacy.

iamsmooth commented 7 years ago

All transactions in Monero are supposed to have strong privacy.

No. All transactions provide privacy on the basis of a strong mathematical and cryptographic foundation. But it is not the case that any of these small ring sizes (3, 5, 12, etc.), by themselves, provide "strong privacy". That is just not a very high degree of obfuscation when viewed at the individual transaction level.

The system works by combining several techniques including stealth address, ring signatures, confidential transactions and later i2p to so that even narrowing down the potential output to a small number of candidates (as ring signatures inevitably do) still allows users to retain a reasonable degree of privacy in a majority of cases (even in some cases where a subset of these techniques individually fail). In cases where people want or need even more privacy than this base level, additional techniques and tools can be layered on top of the sound foundation.

olarks commented 7 years ago

@iamsmooth I understand what you are getting at, but for this discussion I am purely interested in the analysis of ring signatures. RingCT, stealth addressing etc are all strong as long as their underlying cryptographic assumptions are sound, which I believe they are.

Ring signatures on the other hand are bound to different assumptions relying on the output selection for the strength of the ring signature. Ring signatures are arguably the most important part of the system because if you know where money is coming from it does not necessarily matter where you sent money or the amount pertaining to the transaction if you can go back and decipher the history of all the transactions involved.

I am skeptical that regular users would use anything but the default option for sending transactions. The majority of transactions in Monero right now are all using ringsize 2 and 4 because of the default options in their wallet of choice. This is also evident in other cryptocurrencies with optional private transactions, they just are simply not used in 95 out of 100 transactions.

Thankfully Monero already protects users by default to a degree, but I want to ensure that there are as few holes as possible by default because most people will only be using the default option for transactions, hence why I favor a static ringsize that uses a larger ringsize so the output selection is less prone to advanced analysis by adversaries.

Treating some transactions differently than others in Monero is putting everyone's privacy at risk.

iamsmooth commented 7 years ago

@olarks The problem I have with your claims is that you aren't backing them up with any sort of research:

. Treating some transactions differently than others in Monero is putting everyone's privacy at risk. [Everyone? By how much?] . I favor a static ringsize that uses a larger ringsize so the output selection is less prone to advanced analysis by adversaries. [How much less prone? Under what conditions?] . if you can go back and decipher the history of all the transactions involved [Can all transactions involved really be deciphered, or only some of them? In the latter case especially, things like stealth addresses and arguably CT do matter, because a partial analysis that leads to a dead end (unknown amount coming from an output not tied to any other outputs nor any particular public address) does not necessary accomplish anything useful.]

You see what I'm getting at here. There are a lot of hasty assumptions and without support for those conclusion the case for imposing higher costs isn't very strong, nor is there a good basis for deciding how much higher those costs should be (why not make ring sizes even bigger!).

olarks commented 7 years ago

@iamsmooth I have made it clear that I am using a ringsize 4 as a baseline for the minimum amount of privacy a transaction should expect to have, and that just having a transaction default to ringsize 4 does not mean it actually achieves our goal because of how outputs are chosen. I explain quite thoroughly in the OP and throughout this thread why it does not in many situations.

The goal here is to have every transaction meet a MINIMUM effective ringsize of 4 for plausible deniability. I explain how static ringsizes help reduce connections made between transactions in my OP. The static ringsize idea is separate from our goal of maintaining ringsize 4 for all transactions. I only mentioned a static ringsize because it is in the same spirit of protecting users privacy in Monero by having all transactions look the similar and thus less prone to analysis, not much needs to explained here for how it does this.

I never said CT and stealth addressing do not matter. I am only saying that if a ring signature is exploited then it is not hard to find out, for example, that the real output could be linked back to an exchange or a store that will have both the identity of the spender and the amount for the transaction logged in their system, completely disregarding both CT and stealth addressing. Hence why getting ring signatures and the output selection right is so important.

The ringsize and output selection go hand in hand. It is an interesting information theory problem because if the selection algorithm is predictable an attacker can easily just pick out any outputs in a ring signature that are expected to occur based on the algorithm. A combination of randomness and formality is required in the selection and balancing this properly is the result of what assurances in ringsize we are trying to make when being analyzed. I chose ringsize 4 as this minimum because that is what MRL suggested and is why we are currently scheduled for a hardfork in September, seems most people agree this is a pretty good baseline for privacy.

In the end it does not matter if the mandatory ringsize we decide on is 7, 12, or 50, as long as the minimum assurance for privacy we can make is ringsize 4 and is accompanied with a selection algorithm that can ensure this, the rest is icing on the cake. Obviously we want to find the minimum mandatory ringsize to have these qualities so scalability is not greatly sacrificed.

I hope you see what I am trying to get at here.

JamesCullum commented 7 years ago

I am skeptical that regular users would use anything but the default option for sending transactions.

Actually most of them use the default in the wallet, not the protocol level minimum. This is why MRL already recommended to have a default of 4 but a minimum of 2. I think we should set the minimum to 4 (it doesn't endanger the privacy of others, so everybody should be able to choose if they want more obfuscation) and can set the default in the official GUI to 12.

olarks commented 7 years ago

@JamesCullum I mentioned that it is because of the default in the wallets in the sentence after the one you quoted :)

JamesCullum commented 7 years ago

You are correct, but in the context of you asking to raise the minimum it felt to me that you mean that a raise of the minimum would implicitly result in higher default transactions (which it would do), while I tried to say that you can raise the wallet defaults in the GUI very easily without changing the protocol minimum in a similiar manner.

And I think that we now need some kind of compromise (there are many different opinions here), hence I suggested the minimum of 4 and wallet default of 12 :)

olarks commented 7 years ago

@JamesCullum So far only @iamsmooth has spoken up against what I have written. My specific proposal only pertains really to ringsize 12. The algorithm does not work well enough for ringsize 4.

This is why I made this discussion 7 months before September so we can come to a conclusion on this. The problem is not simple and I don't think what I wrote is the best option or the only possible option. I imagine I am not the only person who has ever thought about these problems with ring signatures and I hope that we can all figure this out, so we don't just keep kicking the can down the road and need to hardfork again to fix any problems with ringsize/ring signatures etc.

JamesCullum commented 7 years ago

I understood that the ringsize is the major point of this discussion, not the algorithm for output selection. I would stick with the random triangle but allow users to pick own outputs. While this may not be used too often, it would effectively reduce the probability of making a connection between the outputs as this could always be a manual selection as well and statistical analysis can not analyse decentralised output selection.

However, this is a rather different approach to it and I can understand that it may not be feasible compared to other methods.

EDIT: I thought I made it obvious, but I was talking about an optional possibility, so that the current one would be used for the majority of transactions but during an analysis it wouldn't be certain if this was automatically or manually chosen.

olarks commented 7 years ago

The output selection algorithm is more important than the ringsize. If the outputs are poorly chosen the ring signature is wholly useless no matter how many ring partners exist. Allowing users to pick their own ring partners will likely have the opposite effect, as people would lazily choose them and open users up to another way to accidentally compromise theirs and others privacy.

ghost commented 7 years ago

@JamesCullum Users picking outputs == less random than a machine picking outputs. Always.

JollyMort commented 7 years ago

@olarks wrote:

In the end it does not matter if the mandatory ringsize we decide on is 7, 12, or 50, as long as the minimum assurance for privacy we can make is ringsize 4 and is accompanied with a selection algorithm that can ensure this, the rest is icing on the cake.

Do you mean to say that, regardless of what the actual ring size is, you want to make it near certain that you can't ever eliminate all of the ring partners as being the fake ones, and would always remain with minimum of 4 for which you can never tell for sure?

With actual ring size of 5, in some cases it could be trivial to narrow it down or even immediately tell which is the real one and which are fakes. Your proposal seems to want to make this narrowing down hit a wall around 4?

Like, using 4x3=12, where these groups of 4 would be scattered close to some random 3 heights picked as per triangular distribution?

RandomRun commented 7 years ago

@iamsmooth: Thank you for taking the time to write a more detailed explanation of your method, but I must confess I still don't understand it fully. For instance, when you say:

use the distribution of ages of outputs spent by inputs of recent transactions

I am understanding that you would take a recent transaction, and pick one of its inputs I (fake or real, no way to tell for sure), look at the outputs present in the rings that generated it, and keep their ages as a sample point. repeat this process for all inputs I being used in recent transactions. Infer the distribution behind all those ages, and use this distribution for generating your rings going forward.

If that is what you meant, and if people are just using say a uniform, or triangular distribution, because that is what is encoded in their wallets, then wouldn't you just be rediscovering that fact through your measurements? I understand that it shouldn't be exactly that, since whatever the distribution of the real outputs is, it is going to bias the sample even if just a bit. So are you saying that if this process is iterated, over time it should converge to the real outputs age distribution? (I guess assuming participants are not deliberately producing bad rings to destabilize this convergence?) If so that would be very nice. Do you by any chance have this formalized?


@iamsmooth:

["a time collision is overwhelmingly more likely to occur because they belong to the true sender than otherwise"] is not entirely clear. When there are many transactions being performed there will be many (at least approximate) time collisions that randomly occur.

After you pointed this out I realized that I wasn't so sure about the estimations either, so I did a few simulations in Sagemath to gauge how likely decoy collisions really are when using various distributions.

Using Sagemath, and this referrence, I simulated the scenarios where we pick 2 and 4 random decoys for each ring, and kept track of the number of collisions between the two rings we get after 1 million trials. In the table we can see what proportion of those trials contained collisions. (A collision here is when two inputs are picked from the same 2-minute interval.)

Ringsize Uniform Triangular Power(1/2) Zipf
2 0.000194 0.000259 0.000465 0.055021
4 0.000770 0.001017 0.001960 0.185124

So that if we are using a uniform, triangular, or even a power law with exponent 1/2, random decoy collisions are pretty rare leaving user's exposed. With the Zipf distribution, the collision level seems more in line with reality, if I had to guess.

Everybody please feel free to tinker with the code and test other scenarios! The code can easily be modified to check for likelihood of collisions happening between more than two rings in a transaction, and for other time frames, of course.


Code:

'''Slots are 2-minute intervals over the course of a month:'''

TimeSlots = 30*24*30

trials = 1000000
ringsize = 2

'''Distributions:'''

Unif = [1 for i in range(1, TimeSlots + 1)]
Tria = [1-(i-1)/TimeSlots for i in range(1, TimeSlots + 1)]
PowerHalf = [i**(-0.5) for i in range(1, TimeSlots + 1)]
Zipf = [1/i for i in range(1, TimeSlots + 1)]

'''Set RHS to desired distribution:'''
Distribution = Zipf

X = GeneralDiscreteDistribution(Distribution)
Collisions = 0

for j in range(trials):

    '''generating Ring1...'''
    L = []
    for i in range(ringsize):
        L.append(X.get_random_element())
    Ring1 = set(L)

    '''generating Ring2...'''
    L = []
    for i in range(ringsize):
        L.append(X.get_random_element())
    Ring2 = set(L)

    '''Set of collisions...'''
    Set = Ring1 & Ring2

    if Set != set([]):
        Collisions += 1
print "Proportion of trials containing collisions: ", float(Collisions/trials)