Pluggable credential manager framework proposal [PoC included]

mattsb42 commented 6 years ago

Problem

Currently, twine supports obtaining credentials (username, password, and cert locations) from four sources: environment variables, the .pypirc config file, values passed in from the command line, and (specifically for the password) keyring.

These sources provide reasonable options if you are using twine interactively as a human from a trusted computer, but more and more package owners want to push to PyPI (or repository of their choice) automatically as part of their CI pipeline.

I think that a good way to securely enable this usage pattern is to define a pluggable framework for credential managers, similar to how commands are currently managed. Not only will this make things simpler for integration into CI systems, it will also enable the creation of plugins to source credentials from secure credential storage systems such as LastPass, OnePassword, Signet, Amazon Secrets Manager, Hashicorp Vault, HSMs, etc.

PoC Contents

I have put together a proof of concept for what this could look like here[1]. This includes:

an interface for credential managers to implement [2]
- Because it seemed like a good fit, I also added support for #334 in this interface.
credential managers for:
- sourcing credentials from the command line [3]
- sourcing other credentials from the command line but the password only from keyring [4]
- sourcing other credentials from the command line and the password from either keyring or the command line, preferring keyring (this is equivalent to the existing behavior) [5]
tooling to make it simple for commands to obtain the correct credential manager based on user configuration [6]
modification of the upload command to use the new framework [7][8]

Specific areas that need discussion

What should the credential manager interface look like?

The current interface is based on the inputs to the current credential loaders. I think we can probably simplify this.

Passive credential sources.

Similar to how KeyringCredentialManager will specifically source the password only from keyring, ignoring all other sources, I would like to move the other existing passive sources (environment variables and .pypirc configuration) into credential managers that are opportunistically used by DefaultCredentialManager but that can be explicitly avoided if so desired.

Certificates

Currently, certificates are passed through to requests by identifying a local file that contains the certificates in question. I would like to change this to have the credential manager provide the actual certificate data directly. This will enable the use of secure certificate storage systems and avoid having to place sensitive information on the local filesystem.

mattsb42 commented 6 years ago

I've been thinking about the credential manager interface and separation of passive sources, and I think I have a better model. Working on sketching it out in my fork now.

Existing behavior

user input -> ENV -> pypirc -> keyring -> request user input

Existing PoC

user input -> ENV -> pypirc -> [credential manager]

New structure

My goal is to remove the special case of the environment variables and config file credential sourcing, resulting in:

user input -> [credential manager]

Credentials

container for credentials
is not required to contain all types of credentials
is required to contain identifier for package index

Credential Manager

provides credentials
is not required to be able to provide all types of credentials
root parent constructor optionally accepts another credential manager, to which each request is passed off if this credential manager cannot provide the requested credential

Deferred Credentials

configured with at least one of:
- credentials
- credential manager
if credentials are not provided, just uses the credential manager
if credential manager is not provided, just uses the credentials
if both are provided, uses the credential manager for any credential values missing for the provided credentials

Conceptually, this pattern would result in:

DeferredCredentials[Credentials, CredentialManager[CredentialManager[...CredentialManager]]]

With the default credential manager defining its own internal credential manager that would look like:

EnvironmentVariableCredentialManager(
    credential_manager=ConfigFileCredentialManager(
        config_file='~/.pypirc',
        credential_manager=KeyringCredentialManager(
            credential_manager=UserInputCredentialManager()
        )
    )
)

sigmavirus24 commented 6 years ago

See also #361

mattsb42 commented 6 years ago

@sigmavirus24 Yup, I saw that. It was actually what started me thinking down the path of a composable hierarchy of credential managers. Defining a chain like this will be complex from the CLI, but simple if using programmatically via an API.

sigmavirus24 commented 6 years ago

@mattsb42 I like your proposal. I'm thinking that if we're going to provide an interface for plugin developers then we probably need a simpler structure for dynamically aggregating these managers. As such, I've come up with an alternative proposal.

I'm wondering what you think of using a Pipeline object that composes the Manager objects. In your example, this might look like:

username_pipeline = Pipeline([
    EnvironmentVariableManager('TWINE_USERNAME'),
    ConfigFileManager(config=parsed_config, ...),
    KeyringManager('username'),
    UserInputManager(config=parsed_config, 'username'),
])
username_pipeline.append(SomeOtherManager(...))
username = username_pipeline.retrieve()

I think we could use these Managers for most of the configuration options too. We should start with credentials and maybe extend outwards based on initial feedback from these.

Would you be willing to write this up and send it as a design doc? (I think twine will start needing to aggregate separate design docs like #361 and this one but I'm not sure how we want to do that yet. :smile: )

mattsb42 commented 6 years ago

I think that is a sensible approach, and one I had considered. It's largely just a matter of where we put the selection logic. If we're looking at a more general pattern for value managers, I agree, placing the selection logic in a separate component probably makes more sense.

Yeah, I can write this up as a design doc. I'll take a closer look through #361 and model the doc structure on that. Might not be until next week though: PyCon sprints ended for me today, so back to other stuff fighting for priority.

sigmavirus24 commented 6 years ago

Yeah, I can write this up as a design doc. I'll take a closer look through #361 and model the doc structure on that. Might not be until next week though: PyCon sprints ended for me today, so back to other stuff fighting for priority.

Thank you! It doesn't need to be exactly like #361 I was just trying to make something to describe what I was thinking of doing. Also, there's absolutely no rush. I appreciate you being willing to write this up at all. Your thoughtful consideration of this whole topic is sincerely appreciated. It's a lot more thought than some of us have put into this topic.

xavfernandez commented 6 years ago

FWIW it kinda looks like https://getconf.readthedocs.io/en/latest/reference.html#getconf.BaseConfigGetter where you can define a list of "finders" to load configuration/secrets (from env, ini files, python dictionnary, etc).

(disclaimer: I'm one of the employee maintaining this software)

sigmavirus24 commented 6 years ago

From a skim, that looks like what I authored almost 2 years ago for Flake8. Nothing is original. But I would appreciate an example of how your company's project would work in the same way @mattsb42 described

xavfernandez commented 6 years ago

Yes I'm sure dozens of projects have implemented the same kind of logic :)

Using getconf, the code would look like:

import getconf
import getconf.finders

CONFIG = getconf.BaseConfigGetter(
    #getconf.finders.SysArgFinder(),  # This one does not exist
    getconf.finders.NamespacedEnvFinder('twine'),
    getconf.finders.MultiINIFilesParserFinder(['~/twine.ini', '/etc/twine/twine.ini']),
    getconf.finders.SectionDictFinder({'DEFAULT': {'pypi_url': 'https://pypi.org'}}),
)

print(CONFIG.getstr('username'))
print(CONFIG.getint('verbose', default=0))

CONFIG.getstr('username', default='nobody') will then look in order:

in sys.argv for an --username option (again the SysArgFinder currently does not exist)
in os.env for a TWINE_USERNAME env var
in ~/twine.ini then /etc/twine/twine.ini files for a DEFAULT.username value
return 'nobody' otherwise.

Initially, the project was developed to only deal with .ini files and env vars hence the tendency to expect keys to follow a (non-mandatory)section.key pattern.

A nice thing is that it is quite simple and it is easy for a plugin to query a new key twine.CONFIG.getlist('some.new_option') but one major weakness is that it requires rigor to keep the "discoverability" of options (like would be required by a possible SysArgFinder).

di commented 4 years ago

The pip project has a similar feature request: https://github.com/pypa/pip/issues/4475

pypa / twine