pypa / twine

Utilities for interacting with PyPI
https://twine.readthedocs.io/
Apache License 2.0
1.58k stars 303 forks source link

Pluggable credential manager framework proposal [PoC included] #362

Open mattsb42 opened 6 years ago

mattsb42 commented 6 years ago

Problem

Currently, twine supports obtaining credentials (username, password, and cert locations) from four sources: environment variables, the .pypirc config file, values passed in from the command line, and (specifically for the password) keyring.

These sources provide reasonable options if you are using twine interactively as a human from a trusted computer, but more and more package owners want to push to PyPI (or repository of their choice) automatically as part of their CI pipeline.

I think that a good way to securely enable this usage pattern is to define a pluggable framework for credential managers, similar to how commands are currently managed. Not only will this make things simpler for integration into CI systems, it will also enable the creation of plugins to source credentials from secure credential storage systems such as LastPass, OnePassword, Signet, Amazon Secrets Manager, Hashicorp Vault, HSMs, etc.

PoC Contents

I have put together a proof of concept for what this could look like here[1]. This includes:

Specific areas that need discussion

What should the credential manager interface look like?

The current interface is based on the inputs to the current credential loaders. I think we can probably simplify this.

Passive credential sources.

Similar to how KeyringCredentialManager will specifically source the password only from keyring, ignoring all other sources, I would like to move the other existing passive sources (environment variables and .pypirc configuration) into credential managers that are opportunistically used by DefaultCredentialManager but that can be explicitly avoided if so desired.

Certificates

Currently, certificates are passed through to requests by identifying a local file that contains the certificates in question. I would like to change this to have the credential manager provide the actual certificate data directly. This will enable the use of secure certificate storage systems and avoid having to place sensitive information on the local filesystem.

[1] https://github.com/pypa/twine/compare/master...mattsb42:pluggable-creds [2] https://github.com/mattsb42/twine/blob/pluggable-creds/twine/credential_managers/__init__.py#L52 [3] https://github.com/mattsb42/twine/blob/pluggable-creds/twine/credential_managers/user.py [4] https://github.com/mattsb42/twine/blob/pluggable-creds/twine/credential_managers/keyring.py [5] https://github.com/mattsb42/twine/blob/pluggable-creds/twine/credential_managers/default.py [6] https://github.com/mattsb42/twine/blob/pluggable-creds/twine/credential_managers/__init__.py#L29-L33 [7] https://github.com/mattsb42/twine/blob/pluggable-creds/twine/commands/upload.py#L230-L242 [8] https://github.com/mattsb42/twine/blob/pluggable-creds/twine/commands/upload.py#L119-L145

mattsb42 commented 6 years ago

I've been thinking about the credential manager interface and separation of passive sources, and I think I have a better model. Working on sketching it out in my fork now.

Existing behavior

user input -> ENV -> pypirc -> keyring -> request user input

Existing PoC

user input -> ENV -> pypirc -> [credential manager]

New structure

My goal is to remove the special case of the environment variables and config file credential sourcing, resulting in:

user input -> [credential manager]

Credentials

Credential Manager

Deferred Credentials

Conceptually, this pattern would result in:

DeferredCredentials[Credentials, CredentialManager[CredentialManager[...CredentialManager]]]

With the default credential manager defining its own internal credential manager that would look like:

EnvironmentVariableCredentialManager(
    credential_manager=ConfigFileCredentialManager(
        config_file='~/.pypirc',
        credential_manager=KeyringCredentialManager(
            credential_manager=UserInputCredentialManager()
        )
    )
)
sigmavirus24 commented 6 years ago

See also #361

mattsb42 commented 6 years ago

@sigmavirus24 Yup, I saw that. It was actually what started me thinking down the path of a composable hierarchy of credential managers. Defining a chain like this will be complex from the CLI, but simple if using programmatically via an API.

sigmavirus24 commented 6 years ago

@mattsb42 I like your proposal. I'm thinking that if we're going to provide an interface for plugin developers then we probably need a simpler structure for dynamically aggregating these managers. As such, I've come up with an alternative proposal.

I'm wondering what you think of using a Pipeline object that composes the Manager objects. In your example, this might look like:

username_pipeline = Pipeline([
    EnvironmentVariableManager('TWINE_USERNAME'),
    ConfigFileManager(config=parsed_config, ...),
    KeyringManager('username'),
    UserInputManager(config=parsed_config, 'username'),
])
username_pipeline.append(SomeOtherManager(...))
username = username_pipeline.retrieve()

I think we could use these Managers for most of the configuration options too. We should start with credentials and maybe extend outwards based on initial feedback from these.

Would you be willing to write this up and send it as a design doc? (I think twine will start needing to aggregate separate design docs like #361 and this one but I'm not sure how we want to do that yet. :smile: )

mattsb42 commented 6 years ago

I think that is a sensible approach, and one I had considered. It's largely just a matter of where we put the selection logic. If we're looking at a more general pattern for value managers, I agree, placing the selection logic in a separate component probably makes more sense.

Yeah, I can write this up as a design doc. I'll take a closer look through #361 and model the doc structure on that. Might not be until next week though: PyCon sprints ended for me today, so back to other stuff fighting for priority.

sigmavirus24 commented 6 years ago

Yeah, I can write this up as a design doc. I'll take a closer look through #361 and model the doc structure on that. Might not be until next week though: PyCon sprints ended for me today, so back to other stuff fighting for priority.

Thank you! It doesn't need to be exactly like #361 I was just trying to make something to describe what I was thinking of doing. Also, there's absolutely no rush. I appreciate you being willing to write this up at all. Your thoughtful consideration of this whole topic is sincerely appreciated. It's a lot more thought than some of us have put into this topic.

xavfernandez commented 6 years ago

FWIW it kinda looks like https://getconf.readthedocs.io/en/latest/reference.html#getconf.BaseConfigGetter where you can define a list of "finders" to load configuration/secrets (from env, ini files, python dictionnary, etc).

(disclaimer: I'm one of the employee maintaining this software)

sigmavirus24 commented 6 years ago

From a skim, that looks like what I authored almost 2 years ago for Flake8. Nothing is original. But I would appreciate an example of how your company's project would work in the same way @mattsb42 described

xavfernandez commented 6 years ago

Yes I'm sure dozens of projects have implemented the same kind of logic :)

Using getconf, the code would look like:

import getconf
import getconf.finders

CONFIG = getconf.BaseConfigGetter(
    #getconf.finders.SysArgFinder(),  # This one does not exist
    getconf.finders.NamespacedEnvFinder('twine'),
    getconf.finders.MultiINIFilesParserFinder(['~/twine.ini', '/etc/twine/twine.ini']),
    getconf.finders.SectionDictFinder({'DEFAULT': {'pypi_url': 'https://pypi.org'}}),
)

print(CONFIG.getstr('username'))
print(CONFIG.getint('verbose', default=0))

CONFIG.getstr('username', default='nobody') will then look in order:

Initially, the project was developed to only deal with .ini files and env vars hence the tendency to expect keys to follow a (non-mandatory)section.key pattern.

A nice thing is that it is quite simple and it is easy for a plugin to query a new key twine.CONFIG.getlist('some.new_option') but one major weakness is that it requires rigor to keep the "discoverability" of options (like would be required by a possible SysArgFinder).

di commented 4 years ago

The pip project has a similar feature request: https://github.com/pypa/pip/issues/4475