Profiles and profile-scoped paths

geigerzaehler commented 3 years ago

We want to support multiple user profiles with different identities and Git data.

Profiles are identified by an opaque string. (We will use UUIDs as identifiers). The active profile is stored as a string in $XDG_CONFIG_HOME/radicle/active_profileon Linux. Profile specific data is stored in $XDG_{CONFIG|DATA}_HOME/radicle/<profile_id>. For example, the key directory will be $XDG_{CONFIG|DATA}_HOME/radicle/<profile_id>/keys. For other platforms the location is similarly scoped by inserting the profile identifier.

If the active_profile file does not exist, it is created, a new profile identifier is generated and written to the file.

Profiles and the new paths will be used by git-remote-rad and radicle-upstream.

If possible, we will provide functionality to migrate the paths.

APIs

We introduce a new Profile struct with the following methods

impl Profile {
   /// Tries the profile identifier from a well known path on disk. If the file does not exist, it
   /// is created, a new identifier is generated, and written to the file.
   fn init() -> Result<Self, io::Error>  {
       todo!()
   }

   /// Returns the profile identifier
   fn id(&self) -> &str {
       todo!()
   }

   /// Returns [`Paths`] for this profile and creates all the directories.
   fn paths(&self) -> Result<Paths, io::Error> {
       todo!()
   }
}

We remove the Paths::new() from the public API so that Paths can only be obtained through Profile::paths.

The directory paths returned by Paths will include the profile identifier as specified above.

Alternatives

We considered using the identity as the profile identifier but this has a couple of shortcomings: It requires us to generate an identity before we can initialize a profile which complicates a lot of code. This approach also precludes us from having multiple profiles with the same identity. One advantage of using identities to identify profiles is that it would make profile selection easier for the user. But this can also be solved through better tooling, especially given that it is not clear yet how the user would select different profiles.

cloudhead commented 3 years ago

Hey could you say a few words about how this will be used?

dragon788 commented 3 years ago

How does this work on a multi-user system, each user if they have multiple profiles would have them exist under their own XDG directory, but there wouldn't be coordination sharing via the system level XDG (/usr/share/ or similar?).

geigerzaehler commented 3 years ago

Hey could you say a few words about how this will be used?

At first, the purpose of this is to isolate profile specific data for the same user and provide an easy way to restart fresh when a profile is corrupted. One can just delete the active_profile file. We’ll gradually expand the capabilities so that dev’s and power users can play around with this. For example, we’ll provide a RAD_PROFILE environment variable. Down the road it may be possible to also expose it to all users in the UI. But we have no plans for this yet.

How does this work on a multi-user system, each user if they have multiple profiles would have them exist under their own XDG directory, but there wouldn't be coordination sharing via the system level XDG (/usr/share/ or similar?).

Since the directories are based on XDG_*_HOME directories on Linux, different users don’t share any data. If you don’t mess with these variables they will point to $HOME/.config, etc. Does this answer your question?

cloudhead commented 3 years ago

Makes sense! I got somehow confused by the terminology and thought we were talking about public profiles.

kim commented 3 years ago

@dragon788 "Profiles" as introduced here are inherently tied to the private key used, which shall never be used by more than one system user account (nor device, for that matter). System-wide, or cross-profile sharing of the git object database may at some point become a thing, although I don't currently see the benefits outweigh the additional complexity -- most ways to do this would require some kind of storage maintenance tasks, which is probably not worth it until we see peers storing a lot of data.

dragon788 commented 3 years ago

These answers make sense.

For the many/large repository storage scenario in a multi-user system I'd imagine once private repositories are supported there will be a way to host/mirror these repositories on a seed machine without the seed machine actually examining the contents of the repository (due to encryption of the contents of files other than the necessary git ones to host a copy of the code).

If a seed machine/instance is the best way to handle multiple users needing to access the same code on one machine especially since they would still need their local copies for making changes (that could possibly be shallow clones of the larger work), the seed instance could run in a Docker container on that machine or on another machine on the same network.

kim commented 3 years ago

encryption of the contents of files

Well that is a complex topic on its own, and would not help deduplicating objects. Without replacing git's storage layer entirely, there is -- as far as I know -- no efficient (and hash-preserving!) way to manage re-encryption whenever the set of eligible users changes.

dragon788 commented 3 years ago

Yeah I am curious what Keybase uses for their encrypted git where they have no knowledge of what you are hosting but visibility of repositories is managed by team membership.

gopass does team-based GPG encryption of passwords and other secrets but that explicitly requires new commits for any secrets being rotated after a team member has been removed because the new commit triggers re-encryption of the file. Anything not rotated remains readable by former team members who have/get a copy somewhere.

But my train of thought was more wondering whether there would be a way to allow seeding nodes to be given repositories to seed without actually knowing the contents of the repositories or if that would require large amount of extra complexity.

kim commented 3 years ago

On Thu Jan 21, 2021 at 7:46 PM CET, dragon788 wrote:

Yeah I am curious what Keybase uses for their encrypted git where they have no knowledge of what you are hosting but visibility of repositories is managed by team membership.

It's backed by KBFS, where they store entire packfiles as seen locally. This means fetch is a lot less efficient, and just GCs the objects which turn out to be unreachable. This scales only to a few collaborators. The local repository is not encrypted.

geigerzaehler commented 3 years ago

What we haven’t figured out yet is how this feature interacts with the RAD_HOME environment variable. (This came up in https://github.com/radicle-dev/radicle-link/pull/496#discussion_r565931339). There are two potential approaches.

Independent `RAD_HOME`

In this approach using RAD_HOME is independent from profile selection. RAD_HOME effectively serves as a shorthand for setting the XDG_*_HOME base paths:

If RAD_HOME is set then the active profile file is stored in $RAD_HOME/config/active_profile instead of $XDG_CONFIG_HOME/radicle-link/active_profile. Profile specific data will be stored in $RAD_HOME/data/$profile_id instead of $XDG_DATA_HOME/radicle-link/$profile_id where $profile_id is obtained from the active profile file or freshly generated.

This approach makes it possible to combine RAD_PROFILE with RAD_HOME. If both are set, profile specific data will be read from $RAD_HOME/data/$RAD_PROFILE.

`RAD_HOME` controls profile.

With this approach, if RAD_HOME is set, profile specific data is stored in $RAD_HOME/data instead of $XDG_DATA_HOME/radicle-link/$profile_id. This means that the profile is fully determined by RAD_HOME and this requires us to ignore RAD_PROFILE. However, since we always provide a profile ID through the APIs, we still need to store the active profile in some file, say $RAD_HOME/active_profile.

Recommendation

I suggest we go with the first approach. It is conceptually easier since profile selection and RAD_HOME are independent and it is also easier to implement.

kim commented 3 years ago

I'm +1 for approach one, for much the same reasons. It also has the nice property that XDG-savvy *nix users don't need to bother, while platforms which don't natively adhere to the XDG convention gain a simplified version of it.

FintanH commented 3 years ago

You had me at ~compose~ combine. Option one makes the most sense to me :+1: Thanks for laying it out like this, it made it much easier to see the benefits and weigh up the choices :ok_hand:

MeBrei commented 3 years ago

I am also in favour of option 1. Thanks for the write up!

radicle-dev / radicle-link

Profiles and profile-scoped paths #488