Closed calebnorman closed 1 year ago
A user on Discord has been debugging this as well, it appears to only happen in CircleCI when using the same base image. I've asked that user to post the debugging they've done here, but the basic thing we know right now is that Poetry is receiving a SIGINT right after printing the installer summary (determined via an strace).
Had the same problem this morning with the latest poetry
. Pinning back to 1.2.2
during the build solves the issue.
I'm the user @neersighted mentioned from Discord. Here's what I've got so far:
Reproduction code: https://github.com/TheKevJames/experiments/blob/3c986b0df2c2a3cfac52118daa654d00250838eb/.circleci/config.yml#L13-L57 CI run: https://app.circleci.com/pipelines/github/TheKevJames/experiments/221/workflows/2a9215b4-fe0f-4c07-be6e-162f47de689a
tl;dr is python 3.11 works, 3.10 breaks, using the latest docker builds of both. resource_class seems irrelevant. Doing a pip upgrade to latest seems irrelevant.
~Using alpine as a base image (so, eg. python:3.10-alpine
) seems to fail on earlier versions of poetry as well. That seems to show poetry failing as far back as v1.0.0: CI.~ EDIT: I was wrong about this, I made a mistake in specifying the poetry version here. Alpine behaves the same as debian.
Also confirmed this is only ever headless runs, eg. I cannot reproduce when ssh'd in and running poetry
commands myself.
strace --follow-forks
is interesting here as well: failure output contrasted with successful output.
Most notably in that, I see that the SIGINT is happening in both cases -- on further inspection, I think we may have been incorrect in our first read through @neersighted : I don't think this is poetry reporting a SIGINT, but rather us cleaning up our SIGINT handler as part of normal shutdown procedures. It looks like poetry is actually self-terminating in response to the forked process exiting (with exit code 0), I think??
:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:
I also see this happen in python versions 3.8 to 3.10, but it doesn't happen on 3.11 (both cimg/python:3.x
and python:3.x
). It also succeeds if you SSH in and manually run poetry install
. I'm also unable to repro with Docker. Overall, very frustrating to debug 😢
Good job gathering those straces, it looks like what is happening is the worker is throwing an exception and we're swallowing it. This appears to be due to some issue getting Cleo's ExceptionTrace
imported (maybe related to crashtest?):
I don't have more time to dig into this right now, but the exception handling here is plainly suspect (and we definitely should not be doing a conditional import in an exception handler).
Yup, definitely a Cleo issue:
Traceback (most recent call last):
File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/poetry/installation/executor.py", line 244, in _execute_operation
self._sections[id(operation)].write_line(
File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/cleo/io/outputs/output.py", line 88, in write_line
self.write(messages, new_line=True, verbosity=verbosity, type=type)
File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/cleo/io/outputs/output.py", line 109, in write
self._write(message, new_line=new_line)
File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/cleo/io/outputs/section_output.py", line 83, in _write
self.add_content(message)
File "/root/.local/share/pypoetry/venv/lib/python3.10/site-packages/cleo/io/outputs/section_output.py", line 69, in add_content
len(self.remove_format(line_content).replace("\t", " "))
ZeroDivisionError: division by zero
Ah, _terminal
is the smoking gun. cc @Secrus
Ultimate cause is this PR: https://github.com/python-poetry/cleo/pull/175
Note that the stdlib method we're using acts differently in 3.11, which masks the issue: https://docs.python.org/3/library/shutil.html#shutil.get_terminal_size
Note also that the stdlib method falls back to using the terminal height/width values, which explains why the behaviour is different for interactive/non-interactive shells.
Confirming this by running the following on CircleCI, in otherwise equivalent jobs:
# python3.10
$ python3 -c 'import shutil; print(shutil.get_terminal_size())'
os.terminal_size(columns=0, lines=0)
# python3.11
$ python3 -c 'import shutil; print(shutil.get_terminal_size())'
os.terminal_size(columns=80, lines=24)
I think that makes Circle unique here is the fact it's not allocating a TTY, unlike most CI systems (which at this point, tend to provide full VT emulation so they can capture nice colorful/fancy logs). This should be possible to reproduce locally with docker run
(without -t
).
Yeah, it didn't even occur to me earlier on that this might be the case. Fancy logs generally show in CircleCI, so I made the erroneous assumption it was just via standard emulation.
Tossed up a quick fix PR here, though having someone with more Cleo-specific experience take a look would be fantastic.
I'd also note that it took patching poetry source code to discover this issue, given the cleo stuff was eating the trace. I wonder if there's a way we could at least fall back to not-so-fancy output in case of any errors?
I don't think we should try to do anything fancy here, this is a pretty rare and pathological case that has revealed a big hole in test coverage. I would definitely drop the conditional import and reduce the complexity of what we do in the exception handler, but I don't think that bailing out of Cleo I/O is useful (especially as it's needed to test this code).
You'd definitely have a better understanding than I on this! I was thinking potentially could be done fairly simply, though, eg. as a global try-except:
except ...:
try:
# all the existing exception handler stuff, fancy formatting, whatever
except Exception:
traceback.print_exc()
FYI a workaround is to run with --no-ansi
as I stumbled upon this error and then while debugging used that option to clear up some of the output and things ran succesfully.
@rlgomes you've fixed our work deployment pipeline! Thanks. --no-ansi
makes poetry install
work as intended, on the cimg/python:3.8
docker image at least.
Just to add that install hangs on macOS 13.1 with Homebrew python 3.10.
That is unrelated to what is being discussed here, which is related to CircleCI reporting a TTY with dimensions 0/0.
Just confirming that I also faced this bug and that using poetry install --no-ansi
did the trick.
Same thing happens for us, when running poetry install -vvv; echo $?
from CodeFresh. We get the following:
Loading configuration file /codefresh/volume/yo-main-application/poetry.toml
Using virtualenv: /root/.venv
Installing dependencies from lock file
Finding the necessary packages for the current system
Package operations: 211 installs, 0 updates, 0 removals
1
1 being the exit code.
Is it the same thing? Please report your tty size; if it's not 0/0
it's #7148 instead, which we still don't understand the root cause of.
--no-ansi
solved our issue.
Happens to me when running poetry as a pre-commit local repo. Fixed by using --no-ansi
.
--no-ansi to the rescue, worked for us
--no-ansi
worked for me too.
I'm testing a github action workflow and the poetry installation failed when this flag is not passed in. Surprisingly, it worked fine in github CI, but failed when testing this workflow locally using act.
After hours of debugging poetry install --no-ansi
✅
FWIW, I ran into this issue when using ray
which runs some setup commands. The resulting command looked like the following:
ssh -tt -i ~/ray_bootstrap_key.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_X/Y/%C -o ControlPersist=10s -o ConnectTimeout=120s ubuntu@WORKER_IP 'bash --login -c -i '"'"'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_HEAD_IP=HEAD_IP; cd /tmp/library; POETRY_VIRTUALENVS_CREATE=0 /home/ubuntu/.local/bin/poetry install)'"'"
When executed by ray
, it would fail. If I executed the same thing from a shell on the head node it would be successful.
--no-ansi
resolved the issue.
We've been experiencing this issue with poetry install and poetry update (and self update) on circleci. I just tested out a self update to use Poetry version 1.4, and it seems the issue still exists. (The release notes didn't indicate otherwise, but in case that confirmation is helpful)
As a data point and for searchability, I encountered this with Poetry 1.4, but on Buildkite and not CircleCI. Same terminal issue there though:
[2023-03-14T14:18:32Z] python3 -c 'import shutil; print(shutil.get_terminal_size())'
[...]
[2023-03-14T14:18:52Z] os.terminal_size(columns=0, lines=0)
Running tty
in a Buildkite job outputs /dev/pts/0
.
For current status, we've got a proposed fix here from me that won't get accepted as per @neersighted 's comment and a better fix here from the cleo maintainer @Secrus , that unfortunately doesn't yet solve the problem. As I understand, we're waiting on @Secrus to fix and then merge&release that PR.
Until then, all versions of Poetry v1.3+ on CircleCI and Buildkite (maybe others?) are broken for any commands not using the --no-ansi
flag.
Airflow with DockerOperator ando option 'tty=True' seems to fail also for the same reason
Hello. This affects us within docker build
for Docker versions 18.09.4 and 20.10.14 at least, which are pretty old versions but it could still affect a lot of people.
I think the current solution is to just make sure you always use --no-ansi
everywhere you run poetry commands at this point, if those commands might be run in any of the several listed places folks have found 0/0 TTYs occur. So far, no working fix has been accepted for this issue, as far as I am aware.
@Secrus , you closed this as completed, but I'm still seeing the problem today.
poetry install
fails with exit code 1 in act without the --no-ansi
option set.
Versions:
When can we expect to see this work without --no-ansi
?
@iainelder it was closed because the PR with the solution was merged. It will be visible after Cleo gets a new release (it's close, probably sometime in mid-July).
Until then, all versions of Poetry v1.3+ on CircleCI and Buildkite (maybe others?) are broken for any commands not using the --no-ansi flag.
--no-ansi
fixes poetry 1.5.1 silently exiting in Google Colab environment as well 👍
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
-vvv
option) and have included the output below.Issue
Running as part of a circle ci workflow. Steps below. The command
poetry install
identifies thePackage Operations
and then exits with code 1. (output below) SSH into the box and runningpoetry install
produces the normal expected behavior.workflow steps
Poetry Install Step