python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.76k stars 2.27k forks source link

Replacing the URL of a source (e.g. PyPI) at the global level #1632

Open JacobHenner opened 4 years ago

JacobHenner commented 4 years ago

Feature Request

Similar to one of the proposals in https://github.com/sdispater/poetry/issues/1070 (which was recently marked stale), Poetry should allow the user to override the default repository URL (PyPI). The user should be able to do this without modifying pyproject.toml.

In certain environments (e.g. corporate networks) PyPI is unavailable, but a mirror exists. These users should be able to specify the address of the mirror without modifying project files, as the mirror settings are irrelevant to contributors in different environments. Similarly, if a mirror user adds a dependency, the generated lock file should not list the user's mirror as the source. The source should remain the default (which in most cases would refer to standard PyPI).

This feature exists in pipenv, see https://github.com/pypa/pipenv/issues/2075 (where the need for this functionality is described in greater detail) and https://github.com/pypa/pipenv/pull/2281.

countergram commented 4 years ago

This is essential for many business uses, not simply when PyPI is unavailable but also in any case where the organization has its own libraries (not uncommon). Note that since some private repo tools (e.g. Nexus) use basic auth URLs, putting the repo URL into a project config file is absolutely inappropriate and a global config or environment variable (e.g. pip.conf, PIP_INDEX_URL) is necessary.

cjw296 commented 4 years ago

https://github.com/sdispater/poetry/issues/625 also seems related.

Something I tried, which might be nice to make work:

poetry config repositories.pypi https://.../+simple/
vlcinsky commented 4 years ago

@sdispater, I wonder, if #1070 elaboration of requested feature is usable as is or it needs some update. If so, I volunteer to join effort with one or few others, have a telco and try to move this request on as this is one of two showstoppers for our usage of poetry (the other is managing versions of the resulting package - but this I definitely do not want to discuss here).

lovepocky commented 4 years ago

a simple patch for ci/cd:

# install dependencies from lock file
COPY pyproject.toml poetry.lock /opt/app/

RUN sed -i "s/${origin_pypi_url}/${private_pypi_cache_url}/g" poetry.lock
RUN sed -i "s/${origin_pypi_url}/${private_pypi_cache_url}/g" pyproject.toml

RUN poetry install -vvv
bjoernpollex commented 4 years ago

@lovepocky Wouldn't that break the content-hash in poetry.lock? I think this might cause poetry to refresh the lock file.

sdispater commented 4 years ago

Poetry needs the url information of a dependency for a private repository. Otherwise, it cannot guarantee the determinism of the lock file since two files, even with the same name, may not have the same information.

And if it's a question of not storing the private index credentials in the pyproject.toml, only the base url should be put in the pyproject.toml file. The credentials should be configured separately vie the config command or via environment variables, see https://python-poetry.org/docs/repositories/#configuring-credentials

JacobHenner commented 4 years ago

Poetry needs the url information of a dependency for a private repository. Otherwise, it cannot guarantee the determinism of the lock file since two files, even with the same name, may not have the same information.

The idea here is the private repo specified as the override will be a PyPI mirror. The packages served by the mirror will be exact copies of the ones from https://pypi.org/, without any modifications. Anything else belongs in a separate repo, with URLs included explicitly.

vlcinsky commented 4 years ago

Poetry needs the url information of a dependency for a private repository.

I agree. This contributes to usability of poetry as it provides complete information where to install from.

it cannot guarantee the determinism of the lock file since two files, even with the same name, may not have the same information.

{file = "yarl-1.4.2.tar.gz", hash = "sha256:58cd9c469eced558cd81aa3f484b2924e8897049e06889e8ff2510435b7ef74b"}

I thought, that the hash above is calculated from the package file content and does not depend on filename and url and thus it allows to check, that two files (even from different urls) provide exactly the same information.

Treat source url the way git treats remote configuration

The analogy is not perfect, but it is very close to the use case.

git allows to clone a repository, have initial remote configured, but it is easy to change the remote to another git server (e.g. from Github to GitLab or alternative repo name) and all will still work. If I configure the remote badly, git will complain immediately at the first command dealing with remote server, because the commit hashes will not match.

I hope, poetry will once allow me to keep existing pyproject.toml and poetry.lock untouched and accept alternative url (e.g. configured via env variable) of my private pypi for given source (name) to do sort of "temporary git remote reconfiguration".

If my alternative private pypi url serves exactly the same packages for my installation (checked by comparing hashes), all shall run as usually, if alternative url provides different package content, it shall fail.

Such level of determinism would still provide all the service I appreciate from poetry today and would provide enough flexibility to fit common CI/CD processes.

cjw296 commented 4 years ago

As above, my use case is a private pypi mirror. At some stage, the public pypi may even be firewalled off, and it doesn't feel right to have to have a different pyproject.toml for use behind a firewall as for in front of it, for the same code.

jhbuhrman commented 4 years ago

I fully agree with https://github.com/python-poetry/poetry/issues/1632#issuecomment-568199401.

IIRC poetry is using pip already under the hood for a certain part of its functionality. Wouldn't it be sufficient if poetry would simply adhere to the pip.conf (Unix-derived) or pip.ini (Windows) [global] configuration items index, index-url, and trusted-host? (see https://pip.pypa.io/en/stable/user_guide/#config-file)

NateScarlet commented 4 years ago

@jhbuhrman https://github.com/python-poetry/poetry/issues/1554#issuecomment-553113626 said poetry will not going to respect pip.ini

mcouthon commented 4 years ago

I'm a little confused. I would've assumed that this would've been sufficient:

poetry config repositories.REPO_NAME https://artifactory.XXX.com/artifactory/api/pypi

But it seems that setting the config globally doesn't negate the need for setting the URL in each pyproject.toml file. Is that by design or is it a bug? If it's by design, then what's the rationale behind it?

swist commented 4 years ago

This feature would be very useful for scenarios where jwt for authenticating with the registry is prepended to the beginning of the repo url, AWS codeartifact for example builds repo urls like so:

https://aws:<JWT>@<domain>-<aws-account>.d.codeartifact.eu-west-1.amazonaws.com/pypi/python/simple/

Current setup that requires poetry users to define this as a static url inside of pyproject.tml makes it impossible to use (because these are sessioned to ~12hours, JWT gets re-rolled afterwards)

I see the workaround to the effect off:

re log-in whenever the session expires

but that still requires me to set the url on every project rather than once and for all for my docker image builder

vlcinsky commented 4 years ago

@swist I think, that in this case you will manage with existing poetry as the part in front of @ is username (aws) and password (<JWT>), which can be edited out of pyproject.toml file. poetry will store it either in file ~/.config/pypoetry/auth.toml or in system credential store such as in seahorse (I am working in Debian Buster).

Just configure url in form of https://<domain>-<aws-account>.d.codeartifact.eu-west-1.amazonaws.com/pypi/python (for me the form without the /+simple suffix works)

swist commented 4 years ago

Turns out there's a magic envvar (should have finished reading the docs) that does the auth. Still doesn't quite solve the problem when you're accessing the same repository via different vpc endpoints (for example building your images in multiple clusters but pushing to same registry) - that would still require a rewrite of pyproject.toml (and the lockfile I suppose) at build time

m1hawkgsm commented 4 years ago

Turns out there's a magic envvar (should have finished reading the docs) that does the auth. Still doesn't quite solve the problem when you're accessing the same repository via different vpc endpoints (for example building your images in multiple clusters but pushing to same registry) - that would still require a rewrite of pyproject.toml (and the lockfile I suppose) at build time

@swist Do you mind sharing how exactly you're using Poetry with CodeArtifact? Ignoring the rolling creds bit (I'm aware of it), and assuming a hard coded or configured set of creds, that's fine. I'm having a hard time understanding how to get Poetry to work without getting 403's and such (and yes, I've seen the docs for using config, env vars, etc).

Apologies for piggybacking off this thread, I'd message directly or open an issue but looks like you have something already :)

swist commented 4 years ago

@m1hawkgsm turns out there are two separate urls you need to use.

If you want to pull you need to set the url to be

[[tool.poetry.source]]
name = "my_org"
url = "https://my_org-my_account_id.d.codeartifact.region.amazonaws.com/pypi/repo_name/simple/"

But if you want to push you want do the following cli call:

poetry config repositories.myorg https://my_org-my_account_id.d.codeartifact.region.amazonaws.com/pypi/repo_name
bjoernpollex-sc commented 4 years ago

Is there any update on this? On the one hand, this ticket is still open, on the other hand, this comment seems to hint that this might never be implemented.

brandon-leapyear commented 4 years ago

:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:


As another data point, I tried to hack around this by doing find/replace for all mentions of pypi.org with our Nexus URL in POETRY_INSTALL/lib/poetry/repositories/pypi_repository.py. It turns out that Nexus doesn't currently support the package JSON endpoint, so using Nexus would require using the LegacyRepository.

Long story short, it would be great if poetry could allow overriding the PyPI URL, but also allow specifying if poetry needs to use the legacy endpoint for the repository

Update: seems like I got a workaround working

  1. Edit POETRY_INSTALL_DIR/lib/poetry/factory.py:

    
    @@ -88,6 +88,14 @@
    
             poetry.pool.add_repository(repository, is_default, secondary=is_secondary)
mfriedenhagen commented 3 years ago

Hello,

jhbuhrman commented 3 years ago

I am giving up on poetry, it is close to unusable in a shielded development environment with a Nexus, and the maintainer does not seem to understand the frequently brought up issues regarding this. This is sad, because I think it has the greatest dependency-resolver around.

pawamoy commented 3 years ago

Keep the :+1: votes on the issue coming, it could eventually land in the feature roadmap. It's already in the first page of issues when you sort by :+1:

You could also take over or upvote this PR #2074 which, to me, is even better than what this feature request is asking for.

sinoroc commented 3 years ago

My first impression is that this PR https://github.com/python-poetry/poetry/pull/2074 is going in the right direction as well (I do not know if it is the right implementation, I did not look at the code). I guess if you all manage to collaborate on such a PR, it might get released quicker.

I'd also like to draw attention to this PyPA discussion. In my opinion it gives good background insight why the proposed changes here are the right way to go, and why indexes do not belong in pyproject.toml. I also discussed this in https://github.com/python-poetry/poetry/issues/3355#issuecomment-726683158.

mfriedenhagen commented 3 years ago

Coming from Java world, Apache Maven, one of the two de facto standard build tools has the ability as well to define additional repositories for consumption in the pom.xml, the equivalent of pyproject.toml.

However it is considered a bad practice to use the element because it makes scanning stuff for malware much more difficult and because your projects may start to pull stuff from everywhere in the internet.

For Maven you need to reserve a namespace at Maven Central, normally for a reverse-domain you somehow own (poetry could probably reserve com/github/python-poetry/ or org/python-poetry/ for example)

Artifacts are referenced with a complete path, i.e. something like org/python-poetry/poetry-core/1.1.4. However a simple caching mirror (use Nginx e.g.) pointing to https://repo1.maven.org/maven2/ is sufficient to do all caching.

You just state the location of your mirror in a user's.m2/settings.xml file like you do for pip and are done.

Even git allows this kind of mirroring centrally in the user's .gitconfig: https://coderwall.com/p/sitezg/force-git-to-clone-with-https-instead-of-git-urls

Agalin commented 3 years ago

@mfriedenhagen it's pretty similar in Docker world. Images from Docker Hub can be used without any domain prefix and it's considered a bad practice to add different repositories to the default namespace - build results should be repeatable on different environments and same image name pointing to different stuff breaks that idea. Other sources are using full domain name as a namespace. But it's still possible to set global registry mirror by adding it to Docker daemon settings - it's the same behaviour as PIP_REGISTRY_URL. Also keep in mind that as long as you don't use private repo poetry's legacy installer seems to work just fine with PIP_REGISTRY_URL env or config file as it just calls pip without specifying registries. It's private repo where problem arises as then pypi is added as an extra registry unless default registry is set. It doesn't break builds - just makes them really slow due to pip trying to connect to pypi (in my case: from 2-3min to 50min when using 2 private packages).

mfriedenhagen commented 3 years ago

@Agalin, maybe I do misunderstand you here:

mcsheehan commented 3 years ago

I have tried poetry. Like it. Want to use it, am bitten too hard when trying to use a aws codeartifact repository. I can't keep pasting the key into the pyproject.toml and I can't check it into git this way. People add their private repos to the their pip.conf. Please let poetry read the pip.conf or add a flag for that.

mattmess1221 commented 3 years ago

I haven't read the entire thread here, but I feel like automatically adding the proper [[tool.poetry.source]] entry to the pyproject.toml file during poetry init would be useful and not invasive to existing projects. It would be pretty much running this after init. Pardon my shell.

url=$(pip config get global.index-url)
[ -n "$url" ] && echo "
[[tool.poetry.source]]
name = \"pypi-mirror\"
url = \"$url\"
default = true
" >> pyproject.toml
JacobHenner commented 3 years ago

I haven't read the entire thread here, but I feel like automatically adding the proper [[tool.poetry.source]] entry to the pyproject.toml file during poetry init would be useful and not invasive to existing projects. It would be pretty much running this after init. Pardon my shell.


url=$(pip config get global.index-url)
[ -n "$url" ] && echo "
[[tool.poetry.source]]
name = \"pypi-mirror\"
url = \"$url\"
default = true
" >> pyproject.toml

The mirror should not be added to pyproject.toml, since it's likely org-internal. From the description:

In certain environments (e.g. corporate networks) PyPI is unavailable, but a mirror exists. These users should be able to specify the address of the mirror without modifying project files, as the mirror settings are irrelevant to contributors in different environments.

mcsheehan commented 3 years ago

I haven't read the entire thread here, but I feel like automatically adding the proper [[tool.poetry.source]] entry to the pyproject.toml file during poetry init would be useful and not invasive to existing projects. It would be pretty much running this after init. Pardon my shell.

url=$(pip config get global.index-url)
[ -n "$url" ] && echo "
[[tool.poetry.source]]
name = \"pypi-mirror\"
url = \"$url\"
default = true
" >> pyproject.toml

aws codeartifact and many others use the security token in their url - this would mean you'd be storing the current security key (invalid after one day) in the toml - and therefore in git too, and would have to constantly manually change it. At worst it's a security risk, at best it's manual and laborious - the exact thing that you want tooling, such as poetry to make go away.

mfriedenhagen commented 3 years ago

I completely agree with @mcsheehan and @JacobHenner. The only thing which currently works for me in a corporate environment is to run:

poetry config experimental.new-installer false

Then, as @Agalin pointed out, poetry just seems to use pip. So this works in our data center with a .pip/pip.conf like this:

[global]
cert=/etc/ssl/certs/ca-certificates.crt
index-url = https://artifactory.example.com/artifactory/api/pypi/pypi-mam/simple

pypi-mam is a view which aggregates both a private Python repository and pypi.org.

Agalin commented 3 years ago

I’d say that existing poetry config is nearly sufficient. There is nothing wrong with source in pyoroject.toml and source name in the lock file. Just don’t require it having an url. Then there is already existing repositories config. It stores url and credentials of a named repo. It’s just needed to merge it with sources at runtime… No entry in repositories config? Use toml data. Repository matches source name? Use url and credentials from that repo.

ShayNehmad-RecoLabs commented 3 years ago

This is a huge blocker for us to completely move to poetry - we're using a combo of poetry and twine at the moment. Poetry for publishing to our local (on prem) pypi server, and twine for publishing to CodeArtifact. This is a real pain point. If there's an agreed-upon spec, I can take a crack and solve this issue...

Also (this is an afterthought) perhaps native poetry support for CodeArtifact can be developed by the AWS team (maybe reaching out to https://twitter.com/bellevuesteve)? It's in their interest as well :)

ZeroAurora commented 3 years ago

I'm trying to work on this (no guarantees, just have a try)

ZeroAurora commented 3 years ago

Drafted a PR #3624 . I hope you can give it a shot and feedback!

sinoroc commented 3 years ago

@ShayNehmad-RecoLabs @mcsheehan @bjoernpollex and others... There is a PR that could help solve this issue. Would you be able to test it, give feedback? https://github.com/python-poetry/poetry/pull/3624

espdev commented 3 years ago

I absolutely do not understand where the logic is in the source/repo design in Poetry.

First, we have config. We can set repo URL in the config:

poetry config repositories.my_repo https://my-repo-url

However, we can use this URL only for publish. We cannot use this URL when install packages. For install we need to hardcode the same my_repo URL in pyproject.toml:

[[tool.poetry.source]]
name = "my_repo"
url = "https://my-repo-url"

Why? I do not understand.

Moreover, if the repo requires auth, we need to have credentials, which we set via config:

poetry config http-basic.my_repo user password

And surprise! The credentials will work with both URLs when publish and install. It's completely illogical and unintuitive.

Why we need to additionally hardcode URL in pyproject.toml? Why can't I set URL via config and write something like this in pyproject.toml:

[[tool.poetry.source]]
name = "my_repo"
secondary = true

Currently, I cannot use hardcoded repo URL in our production infrastructure when deploying. I need to set the URL via config! But it just doesn't work.

Felix-neko commented 2 years ago

Hi folks! I'm working at a infosec-fetishist organization and have to include

[[tool.poetry.source]]
name = "blablabla_bank"
url = "http://binary/artifactory/api/pypi/pipy-virtual/simple"
default = true

in each of my projects that I deploy on our private servers that don't have direct access to PyPi (we have access only to private artifactory server in local network)

And when I'm working remotely and debugging my projects at home, I have to comment this out to be able to download and install dependencies from PyPi.

If I include this snippet to ~/.config/pypoetry/config.toml it does not help: poetry still tries to download directly from PyPi.

I need a way to set a per-user global config option to download packages from a given private repo as a default one (just like it's done in pip.conf). Please add this, we really need it.

Felix-neko commented 2 years ago

https://github.com/python-poetry/poetry/pull/4944 -- looks like it's implemented here.

IceTDrinker commented 2 years ago

Hello,

Having the global config is great if #4944 is accepted, now as mentionned in this comment: https://github.com/python-poetry/poetry/issues/1632#issuecomment-953736600 being able to not set the internal repo url in pyproject.toml (if the private repo config is available) would be very useful as well, any comment on that maybe ?

Edit: is it possible to also not have the url in the lock file ? I know it's a strecth but we would rather have those urls not committed

JacobHenner commented 2 years ago

For the record, #4944 was rejected earlier today, so this issue remains open without a clear solution proposed. I would be interested in working on an alternative proposal to #4944, but I'm not sure if I'll have an opportunity to do so within the next month or so.

neersighted commented 2 years ago

For clarity, here's my final comment on that PR:

The project is of course open for contributions, and you are welcome to explore a design and even implementation if you want.

Keep in mind that for complex changes like this, it can often be a process to gain consensus. The design has to be something that is generic (useful to all/doesn't disadvantage some users), maintainable (as we are all volunteers and time is limited), and consistent with the existing design/scope/goals of the project.

Generally before embarking on an ambitious change I would suggest starting with smaller contributions so that you can gain experience with the code base and process. Making large changes without having experience contributing to the project can often end with disappointment as you may not be able to come up with a mergable design/implementation.

I think @bmarroquin is potentially in a good spot to work on a V2 of this, if there is time, as they have experience contributing to Poetry in the past, as well having obviously thought about this problem space. However, if you are dead set on trying to take this on as your first contribution, I strongly suggest that you at least join Discord and try and workshop concepts there. After coming up with a design that you are happy with and you think would be accepted for merge, I would create an issue describing it for discussion of the specifics.

Keep in mind that this is a very hard problem -- sources are very much coupled to the project level with the current design and architecture of Poetry, and it may be difficult/not desirable to change that. Also, keep in mind that what many people want is disparate despite it sounding similar. Some users are looking to add additional sources to all their projects (and many are in monorepos where monorepo features might make more sense anyway), and others are looking to do some sort of blanket URL replacement.

That URL replacement becomes difficult when you consider that the most common use for this functionality, files.pythonhosted.org, does not follow the typical file layout of a PEP 508 repository. Indeed, most existing proxies operate at the index level and not the individual package file level.

Finally keep in mind that this is complicated by other items that we already intend to implement in Poetry, such as 'lock file aware sdist builds' that we would like to introduce (e.g. the install-ability of your project as a dependency will be affected by any features in this space and needs to be factored in).

Basically, what I'm saying is that this is hard for even a regular contributor to attempt, and all of us are voulenteers without particular interest in exploring this feature/problem space. This is much too complex and far-reaching for a drive-by pull request to be very successful -- implementing sources past the project/monorepo level will have to be a thoughtful process and will require a lot of patience and motivation. I don't want to discourage people from contributing, but I do want people to realize this is a lot harder than "why don't you just do X."

Edit: Brett Cannon's blog post on the social dynamics of open source is quite helpful.

neersighted commented 2 years ago

Related: #5958

JacobHenner commented 2 years ago

I've published poetry-plugin-pypi-mirror, a plugin that allows pypi.org to be replaced by a mirror specified in an environment variable. It's available on PyPI. Hopefully others will find this useful.

The plugin satisfies the original subject of this issue (Allow user to override PyPI URL without modifying pyproject.toml), but it does not satisfy the current subject as it's not intended to handle replacement of arbitrary sources at the global level.

BaxHugh commented 2 years ago

I've forked @JacobHenner's plugin: poetry-plugin-use-pip-global-index-url. Instead of specifying the mirror URL in an environment variable, the global.index-url from pip config is used. This is good for the use case where credentials to a private mirror are managed in the pip config, and possibly change regularly for security reasons. I't also available on PyPi as poetry-plugin-use-pip-global-index-url.

mfriedenhagen commented 2 years ago

@BaxHugh, great to read, did you consider to create a PR in project of @JacobHenner? I like the idea of reusing PIP. Maybe look for the env var and if that one is not set, fall back to pip.conf?

BaxHugh commented 2 years ago

@mfriedenhagen I didn't really consider it, but like you say, it could be good to add it as a feature to the original. But I feel like it should probably be configurable if so. I'm glad you think it's a good feature. The reason I prefer this behaviour to the original, is that the credentials in my pip.conf for our private PyPi mirror changes every day, so having it taken directly from there works better than using the environment variable. I'm not sure how common a use case that is, to consider adding it to @JacobHenner 's plugin.

neersighted commented 2 years ago

I think people might be seeing (part of why) this is not implemented in Poetry yet -- coming up with a universal design is hard, and whatever we settle on will be stable/supported for a long time to come, with additions/changes being constrained by the first iteration. Hopefully what y'all learn with plugins can be used to inform a well-thought design for Poetry down the line.

mfriedenhagen commented 2 years ago

Right @neersighted, maybe adding the configuration to $USER_CONFIGDIR/pypoetry/config.toml would be better. Is there already a concept of namespacing in the file? E.g. something like

[plugins]
[plugins.poetry_plugin_pypi_mirror]
pypi_mirror_url=https://example.org/repository/pypi-proxy/simple/

and then in auth.toml:

[http-basic]
[http-basic.poetry_plugin_pypi_mirror]
username = "me"
password = "s3cr3t"
neersighted commented 2 years ago

That's pretty much up to plugin authors; Poetry will not reject unknown keys.

e.g.

# config.toml
[foo]
bar = true
$ poetry config --list
...
foo.bar = true
...