Closed ltalirz closed 1 year ago
If you have an idea what could be the reason, I'd be happy to look deeper inside get_shell
to see where the problem lies
Further info by @ramirezfranciscof in https://github.com/aiidalab/aiidalab-docker-stack/issues/202#issuecomment-937847268
Is it possible that shellingham somehow gets confused when the architecture the docker image was built for does not match the architecture of the host OS?
I managed to get a "more minimal" example, if that helps. This is the Dockerfile
:
# syntax=docker/dockerfile:1
FROM python:3-slim
RUN pip3 install shellingham
ENTRYPOINT ["tail", "-f", "/dev/null"]
Then I build it with docker build --platform linux/amd64 -t "baseimage_test" .
, run the container and log in (docker exec -it <image_name> /bin/bash
) to execute the following:
root@baseimage_test:/# python3
Python 3.10.0 (default, Oct 5 2021, 23:49:26) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shellingham
>>> shellingham.detect_shell()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/site-packages/shellingham/__init__.py", line 24, in detect_shell
raise ShellDetectionFailure()
shellingham._core.ShellDetectionFailure
If I build without the --platform linux/amd64
then this doesn't happen:
root@baseimage_test:/# python3
Python 3.10.0 (default, Oct 6 2021, 00:09:42) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shellingham
>>> shellingham.detect_shell()
('bash', '/bin/bash')
I don't have an ARM machine to test this out, so you'll probably need to debug this mostly on your own. Note that macOS and Linux are likely using different implementations (macOS uses the ps
implementation, and Linux the /proc
-based one) and will need to be debugged separately (although I do kind of suspect the root cause is the same).
I'd probably start with doing something like
>>> import shellingham.posix.proc
>>> print(shellingham.posix.proc.get_process_mapping())
and see if there's anything like a shell in there. If not, I'd manually break the loop apart and see where the parsing code went wrong. The fact that this does not happen if you use a native container seems to also indicate that this is something related to the cross-arch translation; maybe a process of Python built against Intel can't map its pid correctly to native ARM? No idea to be honest.
Thanks for the hints @uranusjr !
Indeed, there is nothing that looks like a shell in the process mapping
(base) aiida@b92ecb60a87f:/$ python
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shellingham.posix.proc
>>> from pprint import pprint
>>> pprint(shellingham.posix.proc.get_process_mapping()){'1191': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/erlang/erts-9.2/bin/beam.smp', '-W', 'w', '-A', '64', '-P', '1048576', '-t', '5000000', '-stbt', 'db', '-zdbbl', '32000', '-K', 'true', '-B', 'i', '--', '-root', '/usr/lib/erlang', '-progname', 'erl', '--', '-home', '/var/lib/rabbitmq', '--', '-pa', '/usr/lib/rabbitmq/lib/rabbitmq_server-3.6.10/ebin', '-noshell', '-noinput', '-s', 'rabbit', 'boot', '-sname', 'rabbit@localhost', '-boot', 'start_sasl', '-kernel', 'inet_default_connect_options', '[{nodelay,true}]', '-sasl', 'errlog_type', 'error', '-sasl', 'sasl_error_logger', 'false', '-rabbit', 'error_logger', '{file,"/home/aiida/.rabbitmq/log/rabbit@localhost.log"}', '-rabbit', 'sasl_error_logger', '{file,"/home/aiida/.rabbitmq/log/rabbit@localhost-sasl.log"}', '-rabbit', 'enabled_plugins_file', '"/etc/rabbitmq/enabled_plugins"', '-rabbit', 'plugins_dir', '"/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.6.10/plugins"', '-rabbit', 'plugins_expand_dir', '"/home/aiida/.rabbitmq/rabbit@localhost-plugins-expand"', '-os_mon', 'start_cpu_sup', 'false', '-os_mon', 'start_disksup', 'false', '-os_mon', 'start_memsup', 'false', '-mnesia', 'dir', '"/home/aiida/.rabbitmq/rabbit@localhost"', '-kernel', 'inet_dist_listen_min', '25672', '-kernel', 'inet_dist_listen_max', '25672'), pid='1191', ppid='829'),
'1393': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/erlang/erts-9.2/bin/erl_child_setup', '1048576'), pid='1393', ppid='1191'),
'1449': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/erlang/erts-9.2/bin/inet_gethost', '4'), pid='1449', ppid='1393'),
'1453': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/erlang/erts-9.2/bin/inet_gethost', '4'), pid='1453', ppid='1449'),
'1607': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/postgresql/10/bin/postgres', '-D', '/home/aiida/.postgresql'), pid='1607', ppid='1'),
'1953': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/postgresql/10/bin/postgres', '-D', '/home/aiida/.postgresql'), pid='1953', ppid='1607'),
'1955': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/postgresql/10/bin/postgres', '-D', '/home/aiida/.postgresql'), pid='1955', ppid='1607'),
'1957': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/postgresql/10/bin/postgres', '-D', '/home/aiida/.postgresql'), pid='1957', ppid='1607'),
'1958': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/postgresql/10/bin/postgres', '-D', '/home/aiida/.postgresql'), pid='1958', ppid='1607'),
'1959': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/postgresql/10/bin/postgres', '-D', '/home/aiida/.postgresql'), pid='1959', ppid='1607'),
'1960': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/postgresql/10/bin/postgres', '-D', '/home/aiida/.postgresql'), pid='1960', ppid='1607'),
'2053': Process(args=('/usr/bin/qemu-x86_64', '/usr/bin/runsv', 'cron'), pid='2053', ppid='2050'),
'2055': Process(args=('/usr/bin/qemu-x86_64', '/usr/bin/runsv', 'sshd'), pid='2055', ppid='2050'),
'2059': Process(args=('/usr/bin/qemu-x86_64', '/usr/sbin/cron', '-f'), pid='2059', ppid='2053'),
'2081': Process(args=('/opt/conda/bin/python',), pid='2081', ppid='0'),
'598': Process(args=('/usr/bin/qemu-x86_64', '/usr/bin/ssh-agent'), pid='598', ppid='1'),
'624': Process(args=('/usr/bin/qemu-x86_64', '/usr/lib/erlang/erts-9.2/bin/epmd', '-daemon'), pid='624', ppid='1'),
'715': Process(args=('/usr/bin/qemu-x86_64', '/bin/sh', '/usr/sbin/rabbitmq-server'), pid='715', ppid='1'),
'829': Process(args=('/usr/bin/qemu-x86_64', '/bin/sh', '/usr/lib/rabbitmq/bin/rabbitmq-server'), pid='829', ppid='715')}
I should note that there is also no shell in the output of ps
(is this expected)?
(base) aiida@b92ecb60a87f:/$ ps
PID TTY TIME CMD
598 ? 00:00:00 ssh-agent
1607 ? 00:00:00 postgres
1953 ? 00:00:00 postgres
1955 ? 00:00:00 postgres
1957 ? 00:00:00 postgres
1958 ? 00:00:00 postgres
1959 ? 00:00:00 postgres
1960 ? 00:00:00 postgres
2864 ? 00:00:00 ps
Finally, here is an example of a stat
file and the result of the parsing.
Not sure whether this is intended behavior?
(base) aiida@b92ecb60a87f:/$ cat /proc/1/stat
1 (my_init) S 0 1 1 34816 1 4194560 6753 806908 0 11299 26 2 4432 469 20 0 2 0 21127 263921664 7921 18446744073709551615 1 1 0 0 0 0 0 16781312 1988161279 0 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
(base) aiida@b92ecb60a87f:/$ cat /proc/1607/stat
1607 (postgres) S 1 1502 1502 0 -1 4194560 4609 49891 5296 2087 11 4 226 43 20 0 2 0 23504 855425024 9612 18446744073709551615 4194304 7388541 281474097834016 0 0 0 0 19935232 1988218623 0 0 0 17 2 0 0 0 0 0 7455192 7778992 275849216 281474097838868 281474097838952 281474097838952 281474097840084 0
(base) aiida@b92ecb60a87f:/$ python
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shellingham.posix.proc as p
>>> p.detect_proc()
'stat'
>>> p._get_stat(1,'stat')
('34816', '0')
>>> p._get_stat(1607,'stat')
('0', '1')
P.S. This is just for future reference: it turns out that even on my machine, using the same container, the shell detection error is not always raised.
I was recently able to launch the docker container and enter it without the error occuring.
I removed the container docker rm -f
, created a new one and the error was still gone.
Then I deactivated my conda environment on the host, launched a new container and the problem reappeared.
After this, activating the conda environment again did not remove the error, however - it now persisted. It is not clear to me what was going on here and how to reliably reproduce it.
I should note that there is also no shell in the output of
ps
(is this expected)?
Maybe. I don't really understand this behaviour either, probably due to some kind of magic in Docker or the Linux kernel. But in any way, if the OS is not reporting the existence of a shell, there's really nothing we can do… The application using shellingham is supposed to provide a reasonable default because this kind of oddities do happen, and shell detection can only do so much. This can probably be better explained by some Docker or OCI or Linux on ARM expert, and I am really none of those.
git clone git@github.com:HenryFBP/trading-bot.git
cd trading-bot/
DOCKER_DEFAULT_PLATFORM=linux/amd64 docker run -it -v $(pwd):/mnt python:3.7-slim bash
# from inside the running Docker shell
cd /mnt
pip install pipenv
pipenv --python /usr/local/bin/python install
pipenv shell
I get this even trying to "abstract" away the arm64 and stick to good ol x86_64 (amd64) through emulation
# pipenv shell
Traceback (most recent call last):
File "/usr/local/bin/pipenv", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.7/site-packages/pipenv/vendor/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pipenv/cli/options.py", line 56, in main
return super().main(*args, **kwargs, windows_expand_args=False)
File "/usr/local/lib/python3.7/site-packages/pipenv/vendor/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/pipenv/vendor/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/pipenv/vendor/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/pipenv/vendor/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pipenv/vendor/click/decorators.py", line 84, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pipenv/vendor/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pipenv/cli/command.py", line 429, in shell
pypi_mirror=state.pypi_mirror,
File "/usr/local/lib/python3.7/site-packages/pipenv/core.py", line 2442, in do_shell
shell = choose_shell(project)
File "/usr/local/lib/python3.7/site-packages/pipenv/shells.py", line 239, in choose_shell
type_, command = detect_info(project)
File "/usr/local/lib/python3.7/site-packages/pipenv/shells.py", line 29, in detect_info
raise ShellDetectionFailure
pipenv.vendor.shellingham._core.ShellDetectionFailure
# PIPENV_SHELL='/bin/bash' pipenv shell
Launching subshell in virtual environment...
. /root/.local/share/virtualenvs/mnt-MaCywDhH/bin/activate
root@65c4a982e2d5:/mnt# . /root/.local/share/virtualenvs/mnt-MaCywDhH/bin/activate
(mnt) root@65c4a982e2d5:/mnt#
That fixes it for some reason?
Shellingham is never called if you set PIPENV_SHELL
, since the variable forces pipenv shell
to use that instead of doing any detection. So yeah, you could use that to work around whatever the problem is here.
Alright I finally have a chance to look into this. So for this specific environment combination (an Intel image running in Docker on ARM Mac), the parent process is (interestingly) hooked to a different TTY from the Python process itself. I suspect this is due to some simulation implementation detail that leaked into the container. So this is “easily” amendable by removing the TTY check, but I’m hesitant to just do that since it makes the proc implementation quite a bit slower.
An alternative approach would be to do some refactoring and make process look up lazier, so we only access process IDs that are related to the current process. I’m not particularly motivated to do this myself (especially considering that setting PIPENV_SHELL
is a pretty adequate workaround), but anyone would be much welcomed to contribute to this.
We have a docker container, for which we're running into a shell detection failure:
For some reason, this only happens when running the docker container on M1 Macbooks (on Intel Macbooks, the error in the container does not occur). Observations:
os.name
isposix
For some reason, the shell detection for posix returns
None
https://github.com/sarugaku/shellingham/blob/325c643e89877eb325adf44bc62547251e87acef/src/shellingham/posix/__init__.py#L82-L90Steps to reproduce: