python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.79k stars 2.28k forks source link

Poetry 1.3 dies when run with a TTY reporting size 0/0 #7184

Closed calebnorman closed 1 year ago

calebnorman commented 1 year ago

Issue

Running as part of a circle ci workflow. Steps below. The command poetry install identifies the Package Operations and then exits with code 1. (output below) SSH into the box and running poetry install produces the normal expected behavior.

workflow steps

...
pyenv global 3.10.2
pip install poetry==1.2.2
export PATH="/home/circleci/.local/bin:$PATH"

poetry install -vvv
...

Poetry Install Step

#!/bin/bash -eo pipefail
poetry install -vvv

Creating virtualenv fastapi-template-repository-3aSsmiER-py3.10 in /home/circleci/.cache/pypoetry/virtualenvs
Using virtualenv: /home/circleci/.cache/pypoetry/virtualenvs/fastapi-template-repository-3aSsmiER-py3.10
Installing dependencies from lock file

Finding the necessary packages for the current system

Package operations: 71 installs, 1 update, 0 removals

Exited with code exit status 1
neersighted commented 1 year ago

A user on Discord has been debugging this as well, it appears to only happen in CircleCI when using the same base image. I've asked that user to post the debugging they've done here, but the basic thing we know right now is that Poetry is receiving a SIGINT right after printing the installer summary (determined via an strace).

maurczz commented 1 year ago

Had the same problem this morning with the latest poetry. Pinning back to 1.2.2 during the build solves the issue.

TheKevJames commented 1 year ago

I'm the user @neersighted mentioned from Discord. Here's what I've got so far:

Reproduction code: https://github.com/TheKevJames/experiments/blob/3c986b0df2c2a3cfac52118daa654d00250838eb/.circleci/config.yml#L13-L57 CI run: https://app.circleci.com/pipelines/github/TheKevJames/experiments/221/workflows/2a9215b4-fe0f-4c07-be6e-162f47de689a

tl;dr is python 3.11 works, 3.10 breaks, using the latest docker builds of both. resource_class seems irrelevant. Doing a pip upgrade to latest seems irrelevant.

TheKevJames commented 1 year ago

~Using alpine as a base image (so, eg. python:3.10-alpine) seems to fail on earlier versions of poetry as well. That seems to show poetry failing as far back as v1.0.0: CI.~ EDIT: I was wrong about this, I made a mistake in specifying the poetry version here. Alpine behaves the same as debian.

Also confirmed this is only ever headless runs, eg. I cannot reproduce when ssh'd in and running poetry commands myself.

TheKevJames commented 1 year ago

strace --follow-forks is interesting here as well: failure output contrasted with successful output.

Most notably in that, I see that the SIGINT is happening in both cases -- on further inspection, I think we may have been incorrect in our first read through @neersighted : I don't think this is poetry reporting a SIGINT, but rather us cleaning up our SIGINT handler as part of normal shutdown procedures. It looks like poetry is actually self-terminating in response to the forked process exiting (with exit code 0), I think??

brandon-leapyear commented 1 year ago

:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:


I also see this happen in python versions 3.8 to 3.10, but it doesn't happen on 3.11 (both cimg/python:3.x and python:3.x). It also succeeds if you SSH in and manually run poetry install. I'm also unable to repro with Docker. Overall, very frustrating to debug 😢

neersighted commented 1 year ago

Good job gathering those straces, it looks like what is happening is the worker is throwing an exception and we're swallowing it. This appears to be due to some issue getting Cleo's ExceptionTrace imported (maybe related to crashtest?):

https://github.com/python-poetry/poetry/blob/128c528f392cd79b3f19f0dbf09b6e4c74809e2a/src/poetry/installation/executor.py#L276-L297

I don't have more time to dig into this right now, but the exception handling here is plainly suspect (and we definitely should not be doing a conditional import in an exception handler).

TheKevJames commented 1 year ago

Yup, definitely a Cleo issue:

Traceback (most recent call last):
  File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/poetry/installation/executor.py", line 244, in _execute_operation
    self._sections[id(operation)].write_line(
  File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/cleo/io/outputs/output.py", line 88, in write_line
    self.write(messages, new_line=True, verbosity=verbosity, type=type)
  File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/cleo/io/outputs/output.py", line 109, in write
    self._write(message, new_line=new_line)
  File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/cleo/io/outputs/section_output.py", line 83, in _write
    self.add_content(message)
  File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/cleo/io/outputs/section_output.py", line 69, in add_content
    len(self.remove_format(line_content).replace("\t", "        "))
ZeroDivisionError: division by zero
TheKevJames commented 1 year ago

https://github.com/python-poetry/cleo/blob/35896a0bc27e92f1d1f96d66facbf31531356ed7/src/cleo/io/outputs/section_output.py#L70

neersighted commented 1 year ago

Ah, _terminal is the smoking gun. cc @Secrus

TheKevJames commented 1 year ago

Ultimate cause is this PR: https://github.com/python-poetry/cleo/pull/175

Note that the stdlib method we're using acts differently in 3.11, which masks the issue: https://docs.python.org/3/library/shutil.html#shutil.get_terminal_size

TheKevJames commented 1 year ago

Note also that the stdlib method falls back to using the terminal height/width values, which explains why the behaviour is different for interactive/non-interactive shells.

TheKevJames commented 1 year ago

Confirming this by running the following on CircleCI, in otherwise equivalent jobs:

# python3.10
$ python3 -c 'import shutil; print(shutil.get_terminal_size())'
os.terminal_size(columns=0, lines=0)
# python3.11
$ python3 -c 'import shutil; print(shutil.get_terminal_size())'
os.terminal_size(columns=80, lines=24)
neersighted commented 1 year ago

I think that makes Circle unique here is the fact it's not allocating a TTY, unlike most CI systems (which at this point, tend to provide full VT emulation so they can capture nice colorful/fancy logs). This should be possible to reproduce locally with docker run (without -t).

TheKevJames commented 1 year ago

Yeah, it didn't even occur to me earlier on that this might be the case. Fancy logs generally show in CircleCI, so I made the erroneous assumption it was just via standard emulation.

Tossed up a quick fix PR here, though having someone with more Cleo-specific experience take a look would be fantastic.

I'd also note that it took patching poetry source code to discover this issue, given the cleo stuff was eating the trace. I wonder if there's a way we could at least fall back to not-so-fancy output in case of any errors?

neersighted commented 1 year ago

I don't think we should try to do anything fancy here, this is a pretty rare and pathological case that has revealed a big hole in test coverage. I would definitely drop the conditional import and reduce the complexity of what we do in the exception handler, but I don't think that bailing out of Cleo I/O is useful (especially as it's needed to test this code).

TheKevJames commented 1 year ago

You'd definitely have a better understanding than I on this! I was thinking potentially could be done fairly simply, though, eg. as a global try-except:

except ...:
    try:
        # all the existing exception handler stuff, fancy formatting, whatever
    except Exception:
        traceback.print_exc()
rlgomes commented 1 year ago

FYI a workaround is to run with --no-ansi as I stumbled upon this error and then while debugging used that option to clear up some of the output and things ran succesfully.

qwertyuu commented 1 year ago

@rlgomes you've fixed our work deployment pipeline! Thanks. --no-ansi makes poetry install work as intended, on the cimg/python:3.8 docker image at least.

jfroy commented 1 year ago

Just to add that install hangs on macOS 13.1 with Homebrew python 3.10.

neersighted commented 1 year ago

That is unrelated to what is being discussed here, which is related to CircleCI reporting a TTY with dimensions 0/0.

wadimiusz commented 1 year ago

Just confirming that I also faced this bug and that using poetry install --no-ansi did the trick.

jaoxford commented 1 year ago

Same thing happens for us, when running poetry install -vvv; echo $? from CodeFresh. We get the following:

Loading configuration file /codefresh/volume/yo-main-application/poetry.toml
Using virtualenv: /root/.venv
Installing dependencies from lock file

Finding the necessary packages for the current system

Package operations: 211 installs, 0 updates, 0 removals

1

1 being the exit code.

neersighted commented 1 year ago

Is it the same thing? Please report your tty size; if it's not 0/0 it's #7148 instead, which we still don't understand the root cause of.

jaoxford commented 1 year ago

--no-ansi solved our issue.

yukw777 commented 1 year ago

Happens to me when running poetry as a pre-commit local repo. Fixed by using --no-ansi.

dhart09 commented 1 year ago

--no-ansi to the rescue, worked for us

KhaledMohamedP commented 1 year ago

--no-ansi worked for me too.

I'm testing a github action workflow and the poetry installation failed when this flag is not passed in. Surprisingly, it worked fine in github CI, but failed when testing this workflow locally using act.

After hours of debugging poetry install --no-ansi

bnorick commented 1 year ago

FWIW, I ran into this issue when using ray which runs some setup commands. The resulting command looked like the following:

ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_X/Y/%C -o ControlPersist=10s -o ConnectTimeout=120s ubuntu@WORKER_IP 'bash --login -c -i '"'"'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_HEAD_IP=HEAD_IP; cd /tmp/library; POETRY_VIRTUALENVS_CREATE=0 /home/ubuntu/.local/bin/poetry install)'"'"

When executed by ray, it would fail. If I executed the same thing from a shell on the head node it would be successful.

--no-ansi resolved the issue.

okasen commented 1 year ago

We've been experiencing this issue with poetry install and poetry update (and self update) on circleci. I just tested out a self update to use Poetry version 1.4, and it seems the issue still exists. (The release notes didn't indicate otherwise, but in case that confirmation is helpful)

wlonkly commented 1 year ago

As a data point and for searchability, I encountered this with Poetry 1.4, but on Buildkite and not CircleCI. Same terminal issue there though:

[2023-03-14T14:18:32Z] python3 -c 'import shutil; print(shutil.get_terminal_size())'
[...]
[2023-03-14T14:18:52Z] os.terminal_size(columns=0, lines=0)

Running tty in a Buildkite job outputs /dev/pts/0.

TheKevJames commented 1 year ago

For current status, we've got a proposed fix here from me that won't get accepted as per @neersighted 's comment and a better fix here from the cleo maintainer @Secrus , that unfortunately doesn't yet solve the problem. As I understand, we're waiting on @Secrus to fix and then merge&release that PR.

Until then, all versions of Poetry v1.3+ on CircleCI and Buildkite (maybe others?) are broken for any commands not using the --no-ansi flag.

Plozano94 commented 1 year ago

Airflow with DockerOperator ando option 'tty=True' seems to fail also for the same reason

navaati commented 1 year ago

Hello. This affects us within docker build for Docker versions 18.09.4 and 20.10.14 at least, which are pretty old versions but it could still affect a lot of people.

TheKevJames commented 1 year ago

I think the current solution is to just make sure you always use --no-ansi everywhere you run poetry commands at this point, if those commands might be run in any of the several listed places folks have found 0/0 TTYs occur. So far, no working fix has been accepted for this issue, as far as I am aware.

iainelder commented 1 year ago

@Secrus , you closed this as completed, but I'm still seeing the problem today.

poetry install fails with exit code 1 in act without the --no-ansi option set.

Versions:

When can we expect to see this work without --no-ansi?

Secrus commented 1 year ago

@iainelder it was closed because the PR with the solution was merged. It will be visible after Cleo gets a new release (it's close, probably sometime in mid-July).

Aea commented 1 year ago

Until then, all versions of Poetry v1.3+ on CircleCI and Buildkite (maybe others?) are broken for any commands not using the --no-ansi flag.

--no-ansi fixes poetry 1.5.1 silently exiting in Google Colab environment as well 👍

github-actions[bot] commented 8 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.