python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.61k stars 2.26k forks source link

`NotGitRepository` error when installing multiple packages from one git repository #6958

Closed gnuletik closed 1 month ago

gnuletik commented 1 year ago

Issue

It seems that a race condition occurs when installing two packages:

Repro:

cd /tmp
git clone https://github.com/gnuletik/poetry-lib-monorepo-issue
cd poetry-lib-monorepo-issue
poetry install

It fails with

Package operations: 2 installs, 0 updates, 0 removals

  β€’ Installing package1 (0.1.0 c6f487b): Failed

  NotGitRepository

  No git repository was found at /private/tmp/test-poetry/.venv/src/poetry-multipackages-example

  at /opt/homebrew/Cellar/poetry/1.2.2/libexec/lib/python3.10/site-packages/dulwich/repo.py:1090 in __init__
      1086β”‚             elif (os.path.isdir(os.path.join(root, OBJECTDIR))
      1087β”‚                     and os.path.isdir(os.path.join(root, REFSDIR))):
      1088β”‚                 bare = True
      1089β”‚             else:
    β†’ 1090β”‚                 raise NotGitRepository(
      1091β”‚                     "No git repository was found at %(path)s" % dict(path=root)
      1092β”‚                 )
      1093β”‚
      1094β”‚         self.bare = bare

The following error occurred when trying to handle this error:

NB: output of poetry install -vvv can be found here: https://gist.github.com/gnuletik/ddcb05ff3467f022f9d3540f379763df

Please note that subsequent calls may succeed but a fresh install (after a poetry env remove --all) always fails.

24rr commented 1 year ago

Based on the error message you provided, it looks like the package you are trying to install requires a git repository, but the installation process is unable to find one at the specified location: /private/tmp/test-poetry/.venv/src/poetry-multipackages-example.

To fix this error, you will need to first determine the root cause of the problem. This may involve examining the package's code, as well as the installation process, to identify any issues. It may also be helpful to consult the documentation for the package, or seek help from the package's maintainers or the community.

Once you have determined the cause of the error, you can then take the appropriate steps to fix it. This may involve modifying the package's code, changing the way it is installed, or taking some other action.

neersighted commented 1 year ago

@pneb In this case, the fault lies with Poetry; the diagnosis in the original issue appears correct to me. Related: #7113.

danieldanciu commented 1 year ago

We are also seeing this issue with a docker build that depends on multiple packages from the same git repository.

I suspect that as more and more people adopt the monorepo strategy that is now quite well supported by poetry.

None of the workarounds presented here worked for us, we had to manually serialize the installation of the packages to avoid the race condition.

gnuletik commented 1 year ago

@danieldanciu can you describe the following ?

we had to manually serialize the installation of the packages to avoid the race condition

Did you run a pip install (in your venv) before running poetry install?

pdarulewski commented 1 year ago

Are there any workarounds for this? I have multiple misc modules in a utilities repo and I'd really like to use a few of them in other projects. The issue is pretty annoying because it's hard to pinpoint the exact problem. Especially when the installation seems to work locally but then it randomly fails in CI or in a Docker container, and after retrying, it works again. I have the same issue for Poetry 1.3.2, 1.4.2, and 1.5.1.

gnuletik commented 1 year ago

@pdarulewski I think that the root issue is in the way poetry clone multiple dependencies in parallel.

The fix could be something that disable parallel install for dependencies that comes from the same repository.

https://github.com/python-poetry/poetry/blob/6e942983dff1bcc6d307c7704e8159df0c959a16/src/poetry/installation/executor.py#L71-L77

You could try to totally disable parallel installer with:

poetry config installer.parallel false

as stated here https://github.com/python-poetry/poetry/issues/7949#issue-1716659814

pdarulewski commented 1 year ago

@gnuletik yes, I think so too, I guess I've had other errors related to the .git directory of the monorepo inside the project's virtualenv directory. Setting the parallel to false seems to work, although as expected, the installation time is much slower. It's fine for now, thanks for the hint

Oblynx commented 1 year ago

This would be a great fix! We also use monorepos to handle private python packages and end up with this issue. Turning parallelism off can increase the build time x10 for a large project...

ogreyesp commented 9 months ago

@gnuletik

Setting the parallel to false didn't work in my case.

JonathanRayner commented 3 months ago

Please note that subsequent calls may succeed but a fresh install (after a poetry env remove --all) always fails.

Does anyone have any ideas on how to better consistently reproduce this? I can reproduce it sometimes locally, but not always, which is making fixing it a pain. @gnuletik I was able to reproduce it a few times with your repos, but not every time (even after deleting the environment).

*edit: I seem to be able to reproduce it more consistently running poetry install with this repo https://github.com/JonathanRayner/some_other_repo

JonathanRayner commented 3 months ago

I see a few possible ways forward, but can I ask: what is the expected behavior?

Suppose the following monorepo structure:

monorepo/pkg_1/pyproject.toml
monorepo/pkg_2/pyproject.toml

and another repo that wants to use pkg_1 and pkg_2 as git dependencies:

some_repo/pyproject.toml

which is

[tool.poetry]
name = "some_repo"
version = "0.1.0"
description = ""
authors = ["my name <my_name@myemail.com>"]

[tool.poetry.dependencies]
python = "^3.10 <3.13"

pkg_1 = {git = "git@github.com:MyOrg/monorepo.git", subdirectory = "pkg_1"}
pkg_2 = {git = "git@github.com:MyOrg/monorepo.git", subdirectory = "pkg_2"}

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

When the user installs some_repo, there are some possibilities of what should happen

  1. The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.
  2. The repo monorepo is cloned twice, completely independently for pkg_1 and pkg_2.
Jozefiel commented 3 months ago
  • The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.

The 1. option with throwing error is probably breaking change for us. We are using monorepo approach for storing microservices APIs. Then in other projects, we combine package releases (tags) based on deployment. With throwing error, monorepo approach will not be suitable anymore.

JonathanRayner commented 3 months ago
  • The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.

The 1. option with throwing error is probably breaking change for us. We are using monorepo approach for storing microservices APIs. Then in other projects, we combine package releases (tags) based on deployment. With throwing error, monorepo approach will not be suitable anymore.

Fair! It sounds like a separate clone per parallel install is a sensible default then? ie. each package is completely separate. Perhaps people with very large monorepos use other tooling to handle reducing redundancy with multiple clones anyway?

Jozefiel commented 3 months ago
  • The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.

The 1. option with throwing error is probably breaking change for us. We are using monorepo approach for storing microservices APIs. Then in other projects, we combine package releases (tags) based on deployment. With throwing error, monorepo approach will not be suitable anymore.

Fair! It sounds like a separate clone per parallel install is a sensible default then? ie. each package is completely separate. Perhaps people with very large monorepos use other tooling to handle reducing redundancy with multiple clones anyway?

Maybe git worktree can solve both problems?

github-actions[bot] commented 3 days ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.