RFC: Application Architecture

radicle-dev / radicle-link

The second iteration of the Radicle code collaboration protocol.

Other

423 stars 39 forks source link

RFC: Application Architecture #682

Closed kim closed 3 years ago

kim commented 3 years ago

RFC #673 starts out by describing various functionality in terms of "capability" traits". While a worthwhile exercise for mapping out "core" functionality and defining surface APIs, it is difficult to see how the RFC can reach its stated goal of "[allowing to] compose them easily, allowing upstream consumers to mix-and-match them in any way they desire" without committing to a concrete application architecture.

Specifically, it does not consider that different applications can have (very) different lifecycles, yet will need certain ways to communicate with each other (eg. to be notified of certain changes). It also doesn't talk about key management and authentication, process management, and the distinction between "online" and "offline" operations (recall that only one instance per device key of a peer-to-peer stack is allowed to run at any given time). Somewhat worrying is the implicit tendency to model the system as a single, monolithic server, which is in diametrical opposition to the stated modularity goal.

This proposal aims to fill those gaps, and answer some of the more tricky questions raised in discussions so far. It deliberately leverages platform features, which naturally narrows down the scope of what the core team may support. This is also a matter of focus: we cannot expect others to build on library-level modules, if we haven't shown how to compose them in one particular way.

Constraints liberate: this proposal does not aim to render #673 obsolete, but on the contrary allow it to focus on what it started out with: defining core APIs and module boundaries.

~Rendered~

Edit by Fintan: The rendered link above pointed to the old markdown document. Here's the new rendered link: Rendered

kim commented 3 years ago

Team Call Notes:

It is useful to think of a peer-to-peer node as a separate process (daemon). Only one per device key should be running at any point in time.
It is not entirely clear just how much CLI (-framework) the team is willing to own
1. If it doesn't, then who does?
2. If it provides some commands, then how do those fit into a larger framework, if at all?
3. If anything not strictly protocol related is up to someone else, we end up with:
  1. Functionality which requires connectivity. This is not recomposable -- it's just what the protocol exposes.
  2. Plain git
  3. The Storage interface
    
    The storage can be generalised to: assumptions about certain refs under the refs/rad namespace, and datastructures those peel to. As these are assumptions by the protocol, one could argue that it is not even a public interface: either the protocol needs to base some behaviour on the presence or absence of data at those locations -- and therefore requires a specification -- or it doesn't. Thus, plain git provides extensibility for arbitrary applications.
4. Assuming 1., 2.iii.a and 2.iii.b, we still need to solve most of the issues addressed in the proposal:
  - How do we communicate with the daemon process?
  - How is the daemon process managed in the first place?
  - How is access to keys managed, and can we add stronger forms of authz later?
  - How do we dispatch events, where the sender might not know the recipients up front, and doesn't even care?
    
    Arguably, with only two components (daemon + git), we could simplify this to not require a standalone service, and simply provide a streaming endpoint for the daemon. The problem of application level notifications doesn't go away, though, but is someone else's.
The author did not mention this in the document, but one solution for authz could be macaroons

The issue is that macaroons rely on a shared secret: say we have a tool which loads the passphrase-encrypted key from disk and issues a macaroon, which is passed to an application which wishes to sign something with that key. The application can assert the validity of the macaroon, but it doesn't have access to the key itself (if it had, the mechanism is redundant -- just prompt for the passphrase twice). The same problem exists with JWT or CWT: those could carry the private key as an encrypted payload, but the recipient needs access to the decryption key (respectively the key that decrypts the decryption key).

Securing localhost ist just... mindbogglingly recursive, it seems.

cloudhead commented 3 years ago

Interesting! I have a few questions:

If the idea is that the daemon is "short-lived", ie. you mention it shutting down after a few minutes, does it only wake up when the local user runs a command, or does it also wake up when the external TCP socket has activity? I would hope that there is a way for applications to receive events as soon as possible, eg. a remote ref was updated, a comment was posted on an issues, etc., without requiring user action. I guess this could also be solved by configuration, eg. daemon start --keep-alive.
I'm unclear on how the CLIaaS works, ie. who is supposed to be spawning these CLI processes? Is it the daemon, or is this the application's responsibility, eg. through its own backend?
The PubSub system could be specified for one platform at least, as an example. Currently it's hard to imagine how that would work. Or perhaps having an idea of what a high-level API would look like, and which service/component would be serving this API.
When thinking about the general architecture, if I understand correctly, all "capabilities" are to be implemented as command-line tools that can input/output CBOR. Since this requires some kind of backend service to run (to be able to spawn these commands), I wonder if at that point it's not easier to directly invoke library functions? Basically, I'm trying to understand in which case(s) spawning a CLI makes more sense than calling into a library.

kim commented 3 years ago

If the idea is that the daemon is "short-lived", ie. you mention it shutting down after a few minutes, does it only wake up when the local user runs a command, or does it also wake up when the external TCP socket has activity?

In a lot of situations it is highly undesirable to have some kind of server connected to the internet running in the background: mobile connection, VPN, firewall, general paranoia,...

The idea is that by connecting to the "control" (UNIX) socket, the daemon is started via socket activation. As long as there is a connection it will keep running, when there are none, it will exit after a while. This way, the lifecycle is tied to interactive use.

guess this could also be solved by configuration, eg. daemon start --keep-alive

I think users should not interact with the executable, that's what process management is for. One could emulate "keep alive" by making a command which keeps a connection to the control socket.

I'm unclear on how the CLIaaS works, ie. who is supposed to be spawning these CLI processes?

Same thing: as soon as something connects to the IPC socket, the command server gets spawned.

The PubSub system could be specified for one platform at least, as an example. Currently it's hard to imagine how that would work. Or perhaps having an idea of what a high-level API would look like, and which service/component would be serving this API.

I'm not sure how to answer that. There are a bunch of events emitted by the net::Peer already, which in the case of a single p2p daemon are of interest to more than one recipient. Similarly, when you push through the git CLI.

"capabilities"

Not sure this is good terminology anymore: what is called a capability in #673 is not divisible -- it's just the set of functions you can call on the protocol stack.

It's more additional functionality which is not protocol specific (ie. the protocol doesn't need to know about it).

I wonder if at that point it's not easier to directly invoke library functions?

Yes, that's why those subcommands should be their own library modules. If you don't want to link (eg. because you are using a SCRIPTING LANGUAGE), you don't have to. For example, you can deploy an Electron app without any native code backend, if instead you can instruct the package manager to make sure the CLI server is installed.

FintanH commented 3 years ago

It is not entirely clear just how much CLI (-framework) the team is willing to own

Thanks for laying out the questions. @xla and I had a chat and we landed on the following thoughts.

The team should own and maintain a certain subset of CLI commands. Since the goal is to dogfood our own tasty treats we should build the CLIs that we will want to use. Inspired by tiered systems, we outlined what we will want to maintain as first-class citizens of CLI-topia, followed by what the radicle-link team will have opinionated CLIs on, and finally things that we will not consider as part of our responsibilities.

First-class

The following components are owned and maintained by the radicle-link team and are part of the core experience.

Projects
Persons
Tracking
Replication
Daemon control
- Things like start, stop, restart of the daemon
Daemon stats
- Getting p2p activity like connectivity statistics
Key management
- Generation of new keys for "profiles"

Opinionated-class

The following are components that will have opinionated versions of but will not commit to any SLAs around the maintenance of them. We expect that the community will like to use our versions of the components but could also form their own versions of the components.

Pretty much all future collab object models
- Issues
- Patches
- Wikis
- RFCs
- whatever we can dream of
radicle-surf/radicle-source

No-class

We will not be providing any HTTP/TCP implementations -- in contrast to what we first thought we would do.

Open Questions

The first open question is around the rad-git CLI. In this RFC you made a note:

The obvious drawback is that this requires to bind to a TCP socket, and thus breaks the desired isolation to the logged-in user. A workaround could be to supply the ProxyCommand option to ssh, which proxies the connection over a UNIX socket. While possible to achieve by modifying the user's git configuration, a more robust and flexible solution might be to provide a wrapper command (rad push/pull).

Is the surface area simply push/pull or do we expect more git commands to be needed/added? If so, we could likely commit to that being part of the First-class set, if the proxy over the UNIX socket is too much of a PITA.

The second grey area is around key management. Despite chatting about it on the team call, it's still a bit unclear to both of us what kind of problems we need to solve here.

Something we're not sure about is the validation of online requests to the peer. For example, say we ask to replicate an Urn we don't know about, will that need some verification? Is this what the IPC section is attempting to cover?

We also don't have bright ideas of how to handle the "unlocking" of a key for use in CLI actions, e.g. creating a project. Clearly, we need to provide primitives that work in both a CLI world and a CLIaaS world.

So you're right, key management does seem to be the more interesting problem given the number of open questions about it :)

Conclusion

We see the owning of core CLI components as a positive thing for the team as it will allow us to dogfood more easily and will be synergistic with the protocol development. It will also allow us to think about how the ecosystem may be extended while also guarding important namespaces that we care about.

All that being said, we have formed two immediate goals:

Draft up an RFC that discusses the daemon peer and its API. This will include the expectations of what functionality we will need to provide, e.g. asking and replicating a project.
Draft up an RFC that discusses the CLI plan and infrastructure which will include the points outlined above.

If I missed anything that you think should be included @xla, please feel free to fill them in :)

As always, looking forward to feedback on this for improvment

kim commented 3 years ago

The team should own and maintain a certain subset of CLI commands.

I just don't see how this yields the coherent experience that is so highly valued, without a framework those commands adhere to.

The first open question is around the rad-git CLI. [...] Is the surface area simply push/pull or do we expect more git commands to be needed/added?

The surface area is the entirety of the git suite of commands. The interesting part here is the "server" end: it should be able to rewrite refs when fetching, eg. to display user names (instead of the client-side rewriting), or do Gerrit-style rewrites on push (for/review/master). It is also about notifying other processes of modifications to the repo.

The second grey area is around key management. [...] Something we're not sure about is the validation of online requests to the peer. For example, say we ask to replicate an Urn we don't know about, will that need some verification?

The p2p process needs access to the key for that. Whether the client initiating this request needs to be authenticated depends on the security model: whoever has access to RAD_DAEMON_SOCK also has access to SSH_AUTH_SOCK. So what's the point of authenticating again? Because there is no way to assert that this command was initiated interactively -- it could be a background process, a malicious node module, a DNS rebinding attack on your awesome browser app.

Iow, it is desirable to have a proof of user presence for both security and "experience" reasons (the experience to have 25GiB worth of git history downloaded to your hdd without you having approved of that is certainly not a quality one).

Draft up an RFC that discusses the daemon peer and its API. This will include the expectations of what functionality we will need to provide, e.g. asking and replicating a project.

Draft up an RFC that discusses the CLI plan and infrastructure which will include the points outlined above.

Huh? That's exactly what the two open RFCs #673 and #682 discuss.

FintanH commented 3 years ago

without a framework those commands adhere to.

Sorry, I'm not sure what you mean by "framework" in this context. Could you expand on this?

The surface area is the entirety of the git suite of commands. The interesting part here is the "server" end

Just to clarify, the "server" end being the monorepo storage in our case?

Because there is no way to assert that this command was initiated interactively

Ah, I think part of my confusion was there was an attempt to solve this by signing a request payload with the key at some point, but it was dropped.

Huh? That's exactly what the two open RFCs #673 and #682 discuss.

Sorry, 1. being that #673 get reworked/rewritten to cover the daemon specific functionality, whereas 2. builds on #682 to better outline what the CLI-tiers that we covered (after this discussion is resolved). Alternatively, what were you expecting?

Apologies if I'm misunderstanding some points you're trying to get across.

kim commented 3 years ago

Sorry, I'm not sure what you mean by "framework" in this context.

If there is just an assortment of binaries which happen to be maintained in this repo, then how do they fit with other binaries which want to be part of the radicle suite of commands? For example, you say you want to own "tracking". Someone else is assembling other sorts of commands, but wants that functionality, too. Do they need to assemble the argument list and execve? How do they pass "global" options (verbosity being the classic)? What happens after the tracking state was mutated, is it the responsibility of the "tracking" command here to notify other interested parties? Or is that up to someone else as well? (Nb. tracking is an offline operation)

Just to clarify, the "server" end being the monorepo storage in our case?

The server end is what is implemented as a remote helper currently.

FintanH commented 3 years ago

For example, you say you want to own "tracking". Someone else is assembling other sorts of commands, but wants that functionality, too. Do they need to assemble the argument list and execve?

Well, as mandated by this RFC -- which I'm in agreement with -- "tracking" is also a library module that can be re-used. If for some reason this outside set of commands is not in Rust then ya they need to exec the command I guess.

How do they pass "global" options (verbosity being the classic)?

More reason for us to own central infrastructure I would have thought. If a set of global options are owned by radicle-link as library modules it makes it easier for others to import them in their command extensions.

What happens after the tracking state was mutated, is it the responsibility of the "tracking" command here to notify other interested parties? Or is that up to someone else as well? (Nb. tracking is an offline operation)

The architecture diagram has a link between CLI and PubSub, so wouldn't the "tracking" component use this service to publish that it decided to track something, and then it's up to other interested parties to be subscribed?

kim commented 3 years ago

this RFC -- which I'm in agreement withWe seem to be talking past each other — does your comment not say “this is too much stuff, we’re gonna do some commands to control a p2p daemon, and that’s it”?

alexjg commented 3 years ago

I've written a little bit about other projects with a monolithic architecture in order to make the ideas outlined in this RFC clearer to me, it might be helpful to others: https://gist.github.com/alexjg/e2b63dc103d0a5a895b2a4588a38e2af

FintanH commented 3 years ago

We seem to be talking past each other

Ya, I think so too :sweat_smile:

does your comment not say “this is too much stuff, we’re gonna do some commands to control a p2p daemon, and that’s it”?

That wasn't my intent. My (and xla's) attempt was to build on top of this RFC filling in the gaps based on the architecture proposed. Those gaps leaning more towards implementation details, e.g. what we foresee the daemon peer being able to do (i.e. sprucing up #673), laying out the CLI components radicle-link should own. Maybe this attempt was a mistake due to me misunderstanding something or miscommunicating something? :) Maybe a less async discussion could happen (since we all know async is hard ;)) so that we can get on the same page again.

kim commented 3 years ago

@alexjg That's a nice writeup of some of the implicits which led me to propose a different approach.

One thing actually stood out from your points about IPFS: you have separate bullet points for DHT, bitswap, and graph syncing (which I'm not sure I understand what it is exactly). There is a possible separation of networked operations, which I've always imagined for link:

Keep a daemon running for some period of time
Connect to a list of well-known nodes (seeds, coworker machine, standard git server), bulk sync, disconnect

Perhaps something to keep in mind.

@FintanH Ok, well then. I guess we agree that talking about just the daemon (called peer-to-peer node here) things in #673 makes sense.

I intentionally tagged @cloudhead here, because he had expressed interest in driving a larger "official CLI" kind of experience. Perhaps we could dial him in for synchronously discussing some tactical details (beware of cancellation =]).