Why KeyKOS is fascinating

void4 commented 4 years ago

Historical background

Welcome to the 1970s.

Personal computers don't exist.

Computers look like this:

This one in particular, the SDS 940 (pdf) costs 400,000 1970s-Dollars, that's 2.6 million 2020-Dollars. Running a computer is expensive and every second that they are not used is wasted money.

The SDS is the first commercially available computer built for time-sharing, allowing multiple people to use the computer at the same time. Users connect to the mainframe with a terminal, at first directly, in the physical vicinity of the mainframe.

Later, terminals are connected via phone lines, using a computer over a distance becomes possible. Users don't have to sit in the loud operating room anymore, even non-experts can start to use software without knowing how a computer works or how to operate it. Previously, only governments or large companies or universities could afford this. For these reasons, many expect commercial time-sharing to grow rapidly, the demand for computers is already high.

Several companies enter the market, among them GE and Tymshare, Inc..

Tymshare is immensely successful and profitable, globally, with both companies and governments as customers. They expand rapidly and acquire a company every year. By 1972 Tymshare owns 23 SDS 940 machines, among others.

To allow remote usage, the company creates its own international telecommunications network, Tymnet, which thousands of large companies, educational institutions, and government agencies use to handle cheque clearing, airline ticket booking and other administrative tasks.

This is Tymnet in 1972:

They also connect to the Arpanet, the military predecessor of the internet, here you can see them as a node on the west coast in 1973.

This is the first time people have to think about security in digital multi-user systems. This is the first time one can hack a system which is not in ones physical presence. And people do:

And the companies have to care about this, find ways to secure their systems, develop new architectures.

Tymshare can't allow its users to be able to conduct (corporate) espionage on each other or allow users to siphon off each others computational resources non-cooperatively. so the company has an economic incentive to support such research.

Enter Norman

and Ann Hardy.

They architect and program the GNOSIS (later KeyKOS) operating system, which is centered around a key principle: rights as the fundamental abstraction of the system.

A one of a kind architecture

KeyKOS had a very different notion of what a process is: instead of a process having a fixed memory and other resources attached to (only) it, it is just a collection of 'keys' - access rights - to such resources. In other words, A process does not own resources, it only owns rights to resources, like memory areas.

This brought to digital systems what the analog world had for millenia: a lock-and-key mechanism.

You can only access something if you have a key to it. A key may open/refer to a container which contains more keys. In the resulting object graph it's suddenly trivial to tell what can access what. You can make copies of your keys and pass them on, allowing multiple processes to access and share the same resources.

Unconnected sub-networks are unable to acquire new rights. They can't access anything they do not explicitly have the keys to. They cannot steal your secrets because they are encapsulated, sandboxed.

Since keys represent the right to access other objects, and other objects can control low level functionality of the system, they can also represent access to or control over the physical.

In KeyKOS, one special type of key is the meter key that every process has to have. It refers to an account from which the running process draws consumable resources - cpu time and memory quotas. When spawning a new subprocess, these meter keys are typically passed on to the children (possibly attenuated, so they cannot exhaust the full balance). Processes are therefore organized in a metering tree, which looks like this:

Customers of Tymshare could buy CPU time and distribute it to their employees by giving them a key to it (or to a subset of it). In fact, thanks to their OS, Tymshare could sell CPU seconds and charged for connection time and storage units for memory, This level of granularity is pretty much non-existent in contemporary systems, because without the 'key' (capability) abstraction there is no way to keep rights apart or share (attenuated versions of) them within and between the units of computation.

In current systems, everything from file access rights to process resource quotas are bound to the 'identity' of a process, a process is therefore the tightest boundary of control. A process cannot be compartmented into smaller regions of rights.

Why rights are a powerful abstraction, and where current systems fail

Imagine a function that can write a joke to a file of your choice, however, it costs $5 to use it. The function keeps track of the number of times it was invoked in a separate billing.txt file:

from random import choice

def expensiveJoke(filepath):
    # Append a $5 charge to the billing file
    with open("billing.txt", "a") as billing:
        billing.write("$5\n")

    # Write a random joke to the filepath of your choice
    with open(filepath, "w+") as file:
        jokes = ["Where are average things manufactured? The satisfactory.", "What’s orange and sounds like a carrot? A parrot."]
        joke = choice(jokes)
        file.write(joke)

The output of this function strikes you as so consistently hilarious that you can't help but call it again and again. After some time you realize you and your extended family will be indebted for generations because the billing.txt file now has a zillion lines charging you for $5.

The joke is on you now. Or is it? Think about it for a minute. Your and the life of your family is on the line. Can you do something weird that will undo your mistakes?

expensiveJoke("billing.txt")

See what I did there?

Yes, INFINITE jokes for free, for everyone, because the billing file is overwritten with a joke.

The right to access the billing file is tied to the identity of the process that the function is part of. It's ambient authority, it's just there in the background and we don't really think about it usually, because in contemporary systems, all processes are allowed to basically access anything by default - all files, keyboard input, screen output, the network - without any limits. If there are permission systems, there are typically very coarse grained - granting access to either everything or nothing all at once.

Our function here cannot differentiate between its own rights and the rights given to it. It already is allowed to access everything as it starts and we just pass filepaths to it. It isn't limited like we'd like it to be, it can't receive and handle rights like individual variables.

This is why this issue is called the confused deputy problem. This is like giving a slip of paper with a cell number to a prison guard and instructing them to open the cell of your friend - while he is allowed to do that (and usually only supposed to do in special circumstances), you are not - but you are tricking them into misusing the authority they already have. As you can see, only being able to pass names is not enough.

What is the solution to this problem?

Make rights objects that can be treated like all other objects. Remove/"explode" the key ring so an actor does not act with its full authority every time. Make rights individually transferable, distinguishable, entities.

A filepath by itself designates the location of a file, but it doesn't give any information if the object using it has the right to access it.

"Do not separate designation and authority" - Norm Hardy

Instead of filepaths, you would give the function a key (also called "capability"), like a file handle, which you could only obtain if you already had the right to access the file. By obtaining the key, the function gains the right to access the file as well.

In a capability world, assuming our language supported it, our function would not look much different:

from random import choice

def expensiveJoke(keyToWriteToMyFile):
    # Append a $5 charge to the billing file
    with open(keyToAppendToTheBillingFile) as billing:
        billing.write("$5\n")

    # Write a random joke to the filepath designated by the key with the authority it confers
    with open(keyToWriteToMyFile) as file:
        jokes = ["How do you drown a hipster? You throw him in the mainstream.", "What did the buddhist say to the hot dog vendor? Make me one with everything."]
        joke = choice(jokes)
        file.write(joke)

Since you as a user cannot obtain a key to the billing file, and the function can keep its keys apart like variables, the problem is solved (we assume the function was given the key to the billing file when it was created). Contrast this with current access right mechanisms where every program, even every library imported by a program, can access pretty much everything.

In 98% of programming languages you can't import a library and limit what they can do. There is no way to attenuate their authority. Importing something may have any effect - it may steal your secrets, redefine what the number 2 means or just go into an infinite loop until the end of the universe or crash other programs by allocating more and more memory.

Thus, it becomes impossible to execute untrusted code, because it is impossible to limit its effects. If any third-party code you use is either malicious or compromised on its way to your codebase, your entire system is infected.

KeyKOS shows how this can be solved, by having an access rights architecture that integrates all aspects of computing, from inter-object communication to resource limits as the substrate.

The capability security paradigm inverts the traditional "everything is accessible" approach and makes trust and rights relationships explicit. To send a message to an object, you need to obtain a key referring to it.

Unlocking the future

When rights are reified into objects and can be transferred, things suddenly become possible that weren't previously:

The abilities to:

tell if an object has access to another object -> have absolutely reliable security guarantees
exchange, cooperatively share, heck, even trade resources between objects, users, systems
move ("jump") objects across system boundaries, since keys can refer to objects on remote systems!

Okay, this is pretty abstract. What could this mean specifically in practice?

your devices share resources cooperatively, your smartphone can use your laptops' processor and memory (your smartphone, laptop and your moms' iPad cooperating to run your Minecraft server (symbolic picture))
your computer outsources large computational tasks to the lowest bidder(s), a global market for computation emerges on a fine grained scale (your idle processor trying to sell seven seconds of its computing time in exchange for a promise on future computing time of another processor)
when a system that runs your processes runs out of power, your processes just jump to another and keep running
processes optimize (and bid for) their own physical locations to reduce latency
Confused deputy attacks become a thing of the past
you can import a library and restrict it to pure computation, or access to a single file - as much authority as necessary, but not more (rootkit crying because it realizes it has been sandboxed)
you can generate a random program and run it - because it can have no side effects beyond those you allow. Programs can be evolved, resource efficient programs can be given an evolutionary advantage
the objects you view in virtual or augmented reality setups can be restricted to using a certain quota of resources
the abstract access relations of keys can be visualized and made interactive in augmented reality objects or other user interfaces. Giving a right to something can become much more intuitive for the average human

Now, that's futuristic. And ...interesting, isn't it? :)

Now why the heck isn't this used already?!

It is! But so far only in very special contexts, somewhere where security had to be considered: The OKL4 operating system uses keys/capabilities, it has been deployed to over 2 billion mobile phones. It probably runs on yours, too, on the modem processor that connects you with the cellular network.

The secure enclave of iOS devices uses an OS derived from this line, too. Sel4 is now starting to ship in products. Genode has keys, too. Some government agencies and the military seem to be using it as well. They don't want their rocket launcher or their secrets to be hacked. Still, why isn't this more developed or at least known in commercial or open source environments?

Mostly due to economics, most people I've talked to think.

"The problems remain the same today. Corporations consider it less expensive to live with insecurity and insurance than to try to market a new operating system." - Ann Hardy

Most companies prefer to insure themselves against losses incurred from cyber risks. Often, risks are ignored or not accounted for. Long term strategies are lacking, which leads to computer security research and development, as opposed to mere threat mitigation, being underfunded.

Besides operating systems, there have been a few programming languages that have embraced this paradigm which never really took off, but influenced a few of our now popular languages. There are also hobby projects that try to develop it.

The greatest obstacle for its adoption is the same eternal thing that prevents other new technologies from being used: backward compatibility and mindshare. The Windows, macOS and Linux operating systems are big monoliths with their own access rights systems. Adapting these and the software built on top of and adapted to them to use keys is logistically/financially/socially practically impossible. Capability security systems are more suited to be realized as or built on microkernels. Perhaps there will be more virtual-machine or language-level projects in the future.

There's also a learning curve associated with it, a conceptual inversion to overcome for the programmer used to todays systems. Being explicit about (transferring) rights also incurs an overhead for the programmer. If systems are developed that allow a concise expression of it, or make its application possible, but optional, the paradigm might see more adoption.

When a systems security becomes relevant due to increased user numbers etc., it could be secured when it is economically sensible - something which is impossible with most of todays languages because they lack these security primitives at their lowest level. Nowadays, large companies have to rewrite their entire application stack when this happens.

The capability security discipline allows one to secure a system better over time without changing its structure, by defining rights policies, considering which parts of the application have which keys, attenuating or removing them if they grant too much authority, thus "hollowing out" the attack surface. The Sandboxing Cycle is avoided. CapSec makes connectivity software defined, and not dependent on the very coarse grained choices made by choosing a certain tech stack.

KeyKOS within Tymshare was an enormous financial success in its historical context. There are reasons to believe these principles could be highly useful in a different one. Best example: the internet of things everyone in corporate is talking about.

If that becomes a reality, in a few decades, a trillion tiny processors will be everywhere. It'd be better if we had appropriate mechanisms by then that make them secure. Even better, with the right substrate, they might even be able to manage themselves some day - there won't be enough humans to manage them anyway.

Sufficient security is a necessary requirement for cooperation, this is as true for digital systems as for analog ones. A system that provides a rights mechanism can give rise to higher level constructs: promises on rights, binding promises on rights (contracts), and contracts in turn to reliable higher order patterns of (self-)organization and large-scale cooperation such as trade, something that modern human civilization is built upon.

"Petabyte memory, farming robots: the realization that we could watch every single seed individually" - Joe Armstrong

You probably don't want to have the job of supplying SeedWatcher#9871031415 with a new battery. Nah, it should call for a drone by itself :)

The slides to a talk I gave about this: https://github.com/void4/rarinfo/blob/main/rarinfo.pdf

mikehearn commented 4 years ago

Well, capabilities do exist in most operating systems:

On Linux you can pass file descriptors to processes that don't have the rights to open a specific file/network connection themselves. The joke example is implementable.
On Windows, handles/security tokens are capabilities and can be passed around in a similar way.
When using the SecurityManager, Java objects are capabilities.

The JVM lets you do the joke example because File/InputStream/OutputStream are capabilities. However not many people know how to set up the SecurityManager for sandboxing and docs for it are poor. Pure capability systems are quite hard to work with which is why the JVM provides a hybrid model.

zarutian commented 4 years ago

Pure capability systems are quite hard to work with

In which ways?

mikehearn commented 4 years ago

In the pure model every object reference to any controlled resource has to come from some other object that takes some sort of identity object you're granted at the start and must keep secret. That isn't how real OO APIs are designed and for good reasons e.g. incompatible with modularity.

To be fair it depends how much you squint and what definition of "capability system" you use, as it's not a well defined term.

Once you have an API with many different entrypoints, like the File constructor in Java, or a third party library you added that also wishes to be usable in sandboxed context, you need a way to decide whether to vend a capability that isn't a pure method call on an existing capability and you need some notion of identity, along with what rights that identity should have access to. You may also give out your own rights as capabilities, but experience shows that it's surprisingly hard to get that right 100% of the time. A lot of the old Java applet sandbox escapes were variants of incorrectly implemented capability systems. Modern sandboxes are coarse grained and simple static lists of "code identity -> permission" partly because it's easier to reason about and blocks capability leaks.

I don't really disagree with anything you wrote here by the way, just that real systems evolved towards a mix of (mostly) static sandbox permissions and (a bit of) capability passing despite having the potential for both because in the end it's easy to look at a little text file that says "this library shouldn't access disk" and go, yup, that sounds about right, than it is to do what amounts to a global points-to analysis over a program to decide there's no way it can end up with a capability it shouldn't have.

void4 commented 4 years ago

it's easier to reason about and blocks capability leaks. a global points-to analysis over a program to decide there's no way it can end up with a capability it shouldn't have

In the Sculpt operating system, the cap architecture and UI make everything easily configurable, though the manual process seems to be a bit tedious. But most capability assignments only have to happen once, because most parts of the system aren't very dynamic on the OS component level (filesystems, user interfaces, device drivers).

What I'd love to see are analyses of past deployed cap systems, statistics of how many caps there are per process/domain, the global gini coefficient of that, "rights liquidity" (going with the fluid analogy), cap creation/attenuation/destruction measures etc.

It would also be interesting to survey the meta-capabilities of such systems, e.g. system image snapshots, higher level right creation patterns etc.

Tloru commented 3 years ago

What's interesting to note is that the Zicron microkernel of Google's Fuschia OS uses a similar concept, namely channels and handles. It all seems very interesting so I'm keeping an eye on the project to see where it goes.

void4 / notes