urbit / arvo

https://github.com/urbit/urbit
110 stars 58 forks source link

@p is not bijective #1105

Closed jtobin closed 5 years ago

jtobin commented 5 years ago

.. in other words, there exist multiple atom inputs that resolve to the same @p output.

For example:

> `@p`3.108.299.008
~morlyd-mogmev
> `@p`479.733.505
~morlyd-mogmev

@bencalder stumbled upon this problem early last week and brought it to our attention; I took a look into it and, after bashing around for awhile and getting some help, traced it to an off-by-one-like error in the implementation of a generalised Feistel cipher used to obfuscate @p values for planets. Stars and galaxies aren't affected.

Specifically, the core of the problem is right here in hoon.hoon:

  ++  fice                                              ::  adapted from
    |=  nor/@                                           ::  black and rogaway
    ^-  @                                               ::  "ciphers with
    =+  ^=  sel                                         ::   arbitrary finite
    %+  rynd  3                                         ::   domains", 2002
    %+  rynd  2
    %+  rynd  1
    %+  rynd  0
    [(mod nor 65.535) (div nor 65.535)]
    (add (mul 65.535 -.sel) +.sel)

As per the reference algorithm, one needs to return aR + L when using an even number of Feistel rounds. We use four rounds, but our implementation returns aL + R, i.e. what should be used for the odd parity case. The last line should read:

    (add (mul 65.535 +.sel) -.sel)

Thus, as-implemented fice is not a proper permutation cipher, and we eventually get @p collisions for various inputs.

There appear to be roughly ~360 collisions between planets spawned by stars under any given galaxy, so probably on the order of ~100k in total (I'm basing this estimate on collisions observed between planets under each of ~zod, ~nec, and ~bud, provided by @bencalder). In other words: there are probably about ~200k planets affected in total, or about one in every 25000. There are currently no active planets with conflicting @p's, though.

This is pretty easily fixed, and I've done a lot of testing of a fixed-up version to make sure it's sound -- including an exhaustive test of the entire planetary @p space using urbit-ob -- in order to completely rule out collision errors. And this definitely needs to be fixed.

The problem, as one might expect, is that the "fix" changes every planet's @p (and, in turn, its associated sigil). I'm ~nidsut-tomdun, but am staring at the prospect of becoming ~dasref-tapteg.

We've been discussing a few potential solutions for this internally, but want to get some public feedback before deciding on any of them.

ohAitch commented 5 years ago

The original 2013-release @p scrambling algorithm was as simple as possible, but perhaps slightly simpler than it needed to be. While it definitely had been subjected to this sort of testing before inclusion in hoon.hoon(!), it showed some minor patterning of planet names after their stars, which could be picked out with a trained eye.

When this became known a few years into (the public life of)the project, a more sophisticated pseudocryptographic algorithm was introduced hoping to address this issue. The reception was mixed, both due to urbit's core promise of unrevokable cryptographic titles to memorable names(that time even stars were affected), and the well-known fact that complexity breeds bugs; but Curtis reigned, no concrete issues could be demonstrated, deadlines had to be met. Over much grumbling, a rough consensus was reached that the new version was necessary, under the condition it never has to be changed again. Now it has to be changed again.

Besides the giant "I told you so" I'm going to most unprofessionally claim, my suggestion is simple, and I'm sure came up internally as one of the options:

(Presumably this all extends to moons in the straightforward fashion; comets, long @p secrets, etc should be unaffected, as in the first renaming they were stripped of per-u32 scrambling to guard against this kind of major hierarchy-obfuscation related failure)

On Tuesday, 12 March 2019, Anton Dyudin antechno777@gmail.com wrote:

((The previous great renaming was a mistake))

On Tuesday, 12 March 2019, Jared Tobin notifications@github.com wrote:

.. in other words, there exist multiple atom inputs that resolve to the same @p output.

For example:

@p3.108.299.008 ~morlyd-mogmev @p479.733.505 ~morlyd-mogmev

@bencalder https://github.com/bencalder stumbled upon this problem early last week and brought it to our attention; I took a look into it and, after bashing around for awhile and getting some help, traced it to an off-by-one-like error in the implementation of a generalised Feistel cipher used to obfuscate @p values for planets. Stars and galaxies aren't affected.

Specifically, the core of the problem is right here in hoon.hoon:

++ fice :: adapted from |= nor/@ :: black and rogaway ^- @ :: "ciphers with =+ ^= sel :: arbitrary finite %+ rynd 3 :: domains", 2002 %+ rynd 2 %+ rynd 1 %+ rynd 0 [(mod nor 65.535) (div nor 65.535)] (add (mul 65.535 -.sel) +.sel)

As per the reference algorithm http://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf, one needs to return aR + L when using an even number of Feistel rounds. We use four rounds, but our implementation returns aL + R, i.e. what should be used for the odd parity case. The last line should read:

(add (mul 65.535 +.sel) -.sel)

Thus, as-implemented fice is not a proper permutation cipher, and we eventually get @p collisions for various inputs.

There appear to be roughly ~360 collisions between planets spawned by stars under any given galaxy, so probably on the order of ~100k in total (I'm basing this estimate on collisions observed between planets under each of ~zod, ~nec, and ~bud, provided by @bencalder https://github.com/bencalder). In other words: there are probably about ~200k planets affected in total, or about one in every 25000. There are currently no active planets with conflicting @p's, though.

This is pretty easily fixed, and I've done a lot of testing of a fixed-up version to make sure it's sound -- including an exhaustive test of the entire planetary @p space using urbit-ob https://github.com/urbit/ob-js -- in order to completely rule out collision errors. And this definitely needs to be fixed.

The problem, as one might expect, is that the "fix" changes every planet's @p (and, in turn, its associated sigil). I'm ~nidsut-tomdun, but am staring at the prospect of becoming ~dasref-tapteg.

We've been discussing a few potential solutions for this internally, but want to get some public feedback before deciding on any of them.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/urbit/arvo/issues/1105, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxXhhmL_DcqwYGEnbQy0-Dk37K9w8t8ks5vWJKAgaJpZM4bsc-W .

galenwp commented 5 years ago

This is, indeed, a very significant bug. It’s good to have found it (and to have it fixed). But figuring out the right way forward is going to take a non-trivial amount of discussion.

Having talked about this a bit internally, and the solutions all seem to fall into three categories:

This doesn't include the suggestion above, which I'm not sure I entirely understand, but seems to fall into the 'whitelist exceptions' category that, to memory, was seen as too ugly. I'll let others respond specifically.

Whatever course we take, we will ship the old and new algorithms together (in hoon and JavaScript) so that all of our clients can fall back to the original mapping if it fails on the new mapping. For example: if you try to use an old-style shipname in Bridge, it will fall back gracefully.

We'd like to ship the updated @p by next week. Leaving the existing algorithm in the wild runs the risk of collisions. Let's try to keep this discussion on-topic and focussed.

galenwp commented 5 years ago

Also, given that this is a Curtis-era bug we also shared it with him, to which we got the following (blog-post-like) reply:

How I screwed up everyone's name -- twice

When I resigned, I promised not to post about Urbit again and I meant it. I didn't realize I would have any giant screwups to apologize for, so I'm breaking my promise just this once.

TLDR: there's a bug in the scrambling step that renders Arvo planet names from numbers in the @p format. The error affects only one of every 40,000 planets, but that's enough that the bug has to be fixed. This means everyone's name has to be changed. This is incredibly lame because the scrambler is trivial to test exhaustively. The fault is mine personally.

There is no security compromise. The Ethereum contracts are not affected -- this bug is not in the land registry, Azimuth, just the rendering algorithm in Arvo. Stars and galaxies are not affected. Urbit tickets and shards are not affected -- credentials look like they use @p, but they in fact use the related @q format which has no scrambling.

However, people bond with their Urbit names tightly and quickly, and it really stinks to have to change your name. If you have already used the system at all, arguably you'll never bond quite as well with your second name.

Also, some people have vanity names -- you can reverse any naming scheme to make something meaningful. I never liked vanity names. I feel that the uniform impersonality of Urbit names is crucial. Even seeing someone else's vanity name contaminates that quality of the impersonal, and makes you feel like you're back on Reddit. But this is just a personal bias. If you had a vanity name you liked, I apologize for blowing it up and turning you back into just another impersonal robot.

What the scrambler is

The @p format is basically a syllabic base-256, but it's not just a syllabic base-256.

In particular, you really don't want anyone to instantly be able to tell what star any planet is under. This damages the sense of the whole @p space as homogeneous. Humans are a tribal species and it's very important not to randomly trigger this reflex.

No problem -- just define a pseudorandom permutation of the integers from 2^16 through 2^32-1. Easy! Well, not quite...

The first screwup

A good thing to do at the start of any technical problem is to enumerate any potential areas that could turn into shortcuts. One obvious shortcut in writing the @p scrambler: the permutation does not need any kind of cryptographic soundness.

This shortcut attracted me because I am not a cryptographer. So I just wired up some crude S-boxes and called it a day. One has to move rapidly when sketching out a clean-slate stack.

What we discovered in practice, however, was that good scrambling actually mattered more than I thought. My crude scrambler produced noticeable groups of 256 "cousins" within each star -- planet names varying only in a single syllable.

In retrospect, this was as a charming irregularity, not a fatal showstopper. There's a fine line between perfectionism and neurotic, counterproductive perfectionism.

This was a mistake, but it wasn't worth disrupting anyone's identity. Of course, if you are going to do this -- do it as early as possible. But this first rescrambling (in 2015) was a product mistake that compounded an architecture mistake.

The second screwup

The second screwup was a management mistake that compounded a mathematics mistake.

Urbit having gotten a little way past being one man boiling the ocean, we managed to hire a young engineer who had minimal coding or other job experience, but was a talented if eccentric mathematician, certainly qualified to read and implement cryptography papers.

Assigned this problem, the engineer handled it the right way -- finding an an existing permutation algorithm in a paper by top-tier cryptographers (Black & Rogoway), and implementing it both in Hoon and (for a performance "jet") C. (There is now also a Javascript implementation of this bug.)

As a manager, it's important to try to only hire people who are smarter than you, or better at their jobs in some other way. However, it's also important not to assume that just because someone is smarter than you, they are otherwise self-managing and you can just let them do whatever.

I myself have a casual approach to testing -- but not this casual. Also, previous experience in implementing standard algorithms had convinced me intuitively that crypto either works perfectly or not at all. Apparently this is not the case, which I might have realized if I'd thought about it.

What this accident reveals about Arvo

What this reveals about Arvo is that it still contains a nontrivial amount of research-grade code -- code which evolved from exploratory programming, basically works, but is not adequately documented, tested, even sometimes architected.

That's okay. It's normal. Great code, code which lasts for generations with almost no maintenance, is never written once. It's often good to have a good excuse for another rewrite.

The good news is: (a) the Azimuth contracts on Ethereum are not affected; (b) all these pratfalls in architecture, product management, and quality control were mine, and I don't work on Urbit anymore. But sorry, everyone.

Curtis (formerly) ~sorreg-namtyv (previously) ~tasfyn-partyv

philipcmonk commented 5 years ago
  • assuming by "active" the issue refers to planets /claimed/, not just running, and none of them fall on /either/ side of the ambiguity, the additional complexity is relatively limited. Detect that a number falls in that set, use a sound algorithm(e.g. unbugged Feistel) to scramble planets just in that set. Unify the hacked-into-soundness partial majority algorithm and the fallback algorithm to get a complete sound algorithm, at the cost of a small codebase wart.

If I understand your proposal, the issue is that when we find a colliding old-@p, we can't just switch to new-@p in that case, because that result will itself collide one of the other old-@p's. You could hard-code new mappings for those 100k planets. But, if you're hardcoding, you should ought to do the ~3k existing planets. Still, that's probably too many to be reasonable.

However, that gets me thinking that we don't care about the scrambled number except its textual form. So what if we don't try to solve this as a number-to-number conversion, but the whole @p to text conversion?

Suppose old-scrambler and old-unscrambler is the current scrambler (from unscrambled number to scrambled number). Further, old-print takes a list of prefix syllables, a list of suffix syllables, and a scrambled number and produces the appropriate text; old-parse is the reverse (text to still-scrambled number). Finally, new-scrambler and new-unscrambler is the fixed cipher. Then define the new pretty-printer as a function of n (the number to be printed):

let old-scrambled = (old-scrambler n)
if (old-unscrambler old-scrambled) == n
   return (old-print prefix-syllables suffix-syllables old-scrambled)
else
  return (old-print suffix-syllables prefix-syllables (new-scrambler n))

And define the new parser as a function of p (the text to be parsed into unscrambled @p):

if not planet
  return (old-parse prefix-syllables suffix-syllables p)
if the first syllable of `p` is in prefix-syllables
  return (old-unscrambler (old-parse prefix-syllables suffix-syllables p))
else
  return (new-unscrambler (old-parse suffix-syllables prefix-syllables p))

Thus, in the example @jtobin gave,

`@p`3.108.299.008 = ~morlyd-mogmev
`@p`479.733.505 = ~devwic-rytwis

(second one is an example, I didn't actually calculate it). The trained eye will recognize that the syllables are backward (usually the vowels aio only appear in prefixes and euy only appear in suffixes), but that'll just be a cute piece of trivia.

If you don't want two versions of the cipher, you could just not scramble the backward ones. Sure, eg ~marzod will have one or two planets where the second syllable is ~zodmar, but that's no big deal.

The upshot would be that everyone who currently has a planet would keep that name, and one in every 25000 planets would be renamed to one of those "backward" names.

eglaysher commented 5 years ago

If the backwards ordering is bad for parsing reasons (ie, ambiguity, etc), we could possibly use a secondary prefix table. We ship two 256 phoneme tables, a third wouldn't be a big issue.

asssaf commented 5 years ago

Is there an option to simply prevent the affected planets from ever being spawned?

Fang- commented 5 years ago

@asssaf This would require changes to the smart contracts. They currently have no concept of point names, let alone scrambling. Implementing checks relating to that on the contract layer seems very inappropriate.
Besides, you'd be taking away otherwise usable address space. Bad for the network, but even worse for the stars who you'd be taking planets away from, even if just a few.

baudtack commented 5 years ago

If we can prove that @philipcmonk 's proposal would fix the problem, I think this is the best solution. While it's technically true that what you own with an Azimuth point is a number, it's not why anyone picked their points. People (myself included) are already attached to their ship names. While the thought of this code living in perpetuity revolts me, it's a small amount of kludge with a huge payoff and while it is a scar upon the codebase, it is the price we pay for the sin of not being more rigorous before this point. A thousand years from now when people are reading the code in the urbit of the future, legends will be told of the valiant @philipcmonk 's efforts to save us all from the @pocolypse that forever left a scar as a reminder.

ohAitch commented 5 years ago

@urbit urbit deleted a comment from ohAitch Mar 13, 2019 edited by galenwp

🙄 For anyone not following along over email, this was a single sentence that appears nearly verbatim in the later Curtis quote, and a two-word reference to that sentence. Given this is the case it is my opinion that moderator action was rather unwarranted, but this is, of course, subjective.

This means everyone's name has to be changed. This was a mistake, but it wasn't worth disrupting anyone's identity.

Everyone's name does not have to be changed, as I outlined. It is certainly not the case that it is /more/ worth disrupting anyone's identity in 2019 than in 2015

"Ugly" is obviously an insufficient excuse, especially when the plan is to ship both algorithms anyway. (It is a relatively strong argument against "add a third syllable set" or "make syllable order inconsistent", since those do leak into the UI, but if applied to only the "weird" planets they're better than nothing)

Good point about 5k existing planets: things are in fact even less dire, because all you need is an arbitrary discrimination function that happens to return one result for existing planets, and another for a set containing the 200k. My original proposal is "detect duplicacy during the scrambling algorithm itself somehow", but if that is mathematically infeasible(by running it thrice or sth) and you do need a lookup table, the next option is:

Though to be clear, "50 line constant frozen in hoon.hoon" is a smaller cost than accepting the claim "screwed up everyone's name -- twice". We can and must actually fix this, such that some names get invisibly screwed up but everyone retains their current planet.

Leaving the existing algorithm in the wild runs the risk of collisions.

This is needlessly alarmist, and imho an attempt to push through a bad solution. The chance of collision is 0.0025%, and correspondingly none have been observed in 4 years: we might get 1, we won't get 100, this can be discussed personally with the planets affected. Trying to solve an important problem in 4 days is how you get JavaScript - vintage quirks mode.

Furthermore, since the bad solution being attempted to push through is "revoke everyone's names", someone whose collided name gets revoked and swapped out for a new one will not have a significantly different experience than everyone else on the network whose name gets revoked and swapped out for a new one.

galenwp commented 5 years ago

@ohAitch I deleted a comment that was not productive, and removed a line in the second that made it confusing. Please keep your comments to the technical specifics. Keeping this organized is very important.

There's no hard deadline here. Let's stay focussed on finding the best solution.

ohAitch commented 5 years ago

Everything @baudtack said, except for a subjective judgement that the scar can be significantly larger if it makes @p actually bijective.

A question for consideration under any replacement scheme*: we have 100k planet names that two addresses map to, presumably this leaves over 100k planet names that no addresses map to. What do those planet names map to, hopefully the duplicate addresses? How to they fit in with a third(or swapped-pair) syllabary, just a parse error?

*forgive me if this has already been addressed

jtobin commented 5 years ago

we have 100k planet names that two addresses map to, presumably this leaves over 100k planet names that no addresses map to. What do those planet names map to, hopefully the duplicate addresses?

@ohAitch Right, presumably the @p function is also not onto. I haven't looked into this in any detail, though -- it wasn't necessary to establish a fix for the cipher, and requires a bit more legwork.

max19 commented 5 years ago

Curious if this works

?:  =(65.535 +.sel)
  (add (mul 65.535 +.sel) -.sel)
(add (mul 65.535 -.sel) +.sel)
jtobin commented 5 years ago

@max19 Just thinking through it, I suspect that works. I just tested it on the atoms of a known-collision @p and indeed get different ones back.

I'm going to run a small-scale test to confirm, but reckon that you've just solved this.

jtobin commented 5 years ago

Ok, I've just checked the fix @max19 proposed on a smaller input space. Here's the result of the broken cipher when mapped over the array [0, .., 11]:

[ 3, 5, 4, 6, 7, 6, 1, 3, 8, 0, 9, 2 ]

Note that 3 and 6 are duplicated. And here's the result when using the corrected cipher:

[ 9, 5, 4, 10, 7, 6, 1, 3, 8, 0, 11, 2 ]

It fixes the duplicates, although there is some collateral damage in that 9, which is not duped, gets changed to 11. Otherwise it works perfectly. 🍾

We should be able to check to see if any currently-active planets are at risk of being changed. I can write up some code to do that.

belisarius222 commented 5 years ago

@max19 or @jtobin, could one of you please explain the goal of @max19's proposal? I don't know the details of the Feistel thing well enough to understand the implications.

ohAitch commented 5 years ago

The high-level goal is "make the scrambling produce the same numbers for most non-broken planets, and different numbers for broken planets"; via some math wizardry over what is causing the scrambling to break in the first place.

Further testing is required(in particular someone should pay the Weregild to AWS and just brute-force the whole address space using 1000 parallel fakezods, as a consistency check) but this looks extremely promising!

On Wednesday, 13 March 2019, Ted Blackman notifications@github.com wrote:

@max19 https://github.com/max19 or @jtobin https://github.com/jtobin, could one of you please explain the goal of @max19 https://github.com/max19's proposal? I don't know the details of the Feistel thing well enough to understand the implications.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/urbit/arvo/issues/1105#issuecomment-472607534, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxXhpwqMu6brwm8ZoZy2uroXmLSUHX9ks5vWWnPgaJpZM4bsc-W .

jtobin commented 5 years ago

@belisarius222 What @ohAitch said. It appears that the solution converts the cipher into a real permutation cipher, which will allow us to preserve most (but not all) of the existing @p's. However, it may be the case that no currently-claimed planet is at risk of being changed.

I'll indeed exhaustively test this, though I've been paying the weregild to Linode instead of AWS. 😄

ohAitch commented 5 years ago

Would love to see the full list of before and after changed planets! That should only come out to a couple dozen MB right? 200k collisions + ~200k collateral damage x 14 chars @p + 10 chars @ux + 2 whitespace x 2 copies, if you don't attempt to optimize the thing like at all

On Wednesday, 13 March 2019, Jared Tobin notifications@github.com wrote:

@belisarius222 https://github.com/belisarius222 What @ohAitch https://github.com/ohAitch said. It appears that the solution converts the cipher into a real permutation cipher, which will allow us to preserve most (but not all) of the existing @p's.

I'll indeed exhaustively test this, though I've been paying the weregild to Linode instead of AWS. 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/urbit/arvo/issues/1105#issuecomment-472615004, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxXhvxyp-JncVXoI3pU7FUJUBygyCS4ks5vWW7-gaJpZM4bsc-W .

philipcmonk commented 5 years ago

Everyone's name does not have to be changed, as I outlined. It is certainly not the case that it is /more/ worth disrupting anyone's identity in 2019 than in 2015

To be clear, this was believed to be the case. We had of course thought of your idea to "change only the colliding planets", but hadn't found a way to map them to new planets without that new mapping colliding with the ones you're not remapping -- you can't just use the "fixed" scrambler as you suggested. Other than hardcoding all the existing planets, unless I misunderstand your proposal, I think mine was the first serious one that could actually keep everyone's names. So it was appropriate to warn people that we didn't have a viable solution at the time that would let people keep their names.

If Max's solution works, then it's an even cleaner way to remap to the unused planet space. I eagerly await @jtboin's field results!

ohAitch commented 5 years ago

I think posting on twitter with "warning: we might have a problem with keeping people's names" was reasonable, but opening the technical discussion with "so obviously there's no way to keep people's names" was not - I was responding specifically to the quote from Curtis.

My proposal was really answering the general OP question of "how should we handle this" with a set of bounds on what the results of correct handling would look like: the "determine whether this planet is a collision or not" section was not trying to finalize a specific implementation, just establishing what looked like the correct direction of investigation. The Max solution certainly is a direct answer to the question "what's a heuristic that will let us detect problematic planets, and map them to each other"!

I maintain that hardcoding a compressed representation of the planet list is aesthetically painful but the right call, if we don't have a more clever solution. (It looks like we do have a more clever solution.) I was certainly trying to compose a Questionable Hack based on bloom filters when the Max email came in, which would have been a lot more messy but possible to tune for the desired property of "collided planets off to one set, spawned planets in another, some unspawned uncollided planets distributed haphazardly into the "collided" bucket but we don't care"

On Wednesday, 13 March 2019, Philip Monk notifications@github.com wrote:

Everyone's name does not have to be changed, as I outlined. It is certainly not the case that it is /more/ worth disrupting anyone's identity in 2019 than in 2015

To be clear, this was believed to be the case. We had of course thought of your idea to "change only the colliding planets", but hadn't found a way to map them to new planets without that new mapping colliding with the ones you're not remapping -- you can't just use the "fixed" scrambler as you suggested. Other than hardcoding all the existing planets, unless I misunderstand your proposal, I think mine was the first serious one that could actually keep everyone's names. So it was appropriate to warn people that we didn't have a viable solution at the time that would let people keep their names.

If Max's solution works, then it's an even cleaner way to remap to the unused planet space. I eagerly await @jtboin's field results!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/urbit/arvo/issues/1105#issuecomment-472621012, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxXhjWJlhAxkZTYQbmg9i-UCaxl542fks5vWXNfgaJpZM4bsc-W .

jtobin commented 5 years ago

I've since run another exhaustive test of the planetary @p space using @max19's solution and can confirm that it fixes the problem, while also preserving the vast majority of existing planet names.

I've also checked a list of currently-claimed planets that @Fang- provided me, and none appear to be at risk of changing due to the fix.

Fang- commented 5 years ago

Closing now that #1110 has made it to live. Rejoice!