Closed al3x609 closed 6 years ago
This likely is a bug! Could you give me the exact version of singularity (master branch?) along with the recipe for the image you are testing? I can help and debug if I can reproduce this.
hi, the error also scales at the time of creating the instance from python. spython 0.0.43
recipe test:
BootStrap: docker
From: centos:latest
%labels
test "test-label"
does not release the python's prompt
this happens with ipython and with spython shell too
but the process is created.
From your recipe, you are doing "run" to an instance that doesn't have a declared runscript (based on your recipe) and so the default would be to call an interactive shell. In the terminal you would go into that shell. In spython, you hang because the shell is another process and it doesn't go in. So - you perhaps want to better identify first what you are trying to do. if you want an interactive shell, doing this from spython doesn't make sense. If you want to truly run a command, then you need to define what it is in the runscript. if you want to quickly "send" a command then do exec. The behavior that I see above is expected given this observation. Let me know what you are trying to do and I can offer suggestions for how to help you achieve that.
Look what happens is that I need to run several services inside a container, I understand according to your advice in previous conversations, that I must use "instances", now, these services are raised in a script [.sh] within the %runscript section, but I must send some parameters to that script, therefore, as I understood the singularity manual for the instance I must put my script in the %startscript section, but if I do this, I can not pass parameters to it, it does not take them, I have already asked several times how to send parameters to the %startscript section in several places without an answer.
My intention is through python to lift those containers with the spython API, this script will be launched with SLURM.
The error that I mentioned before, also happens if I define a %runscript
, it is also hung.
I'm sorry that the singularity maintainers have not been responsive to you - I no longer wear that hat (I am lead developer for several supporting software but Sylabs is now in charge of Singularity itself) and I don't follow development beyond releases. I will help with the issue of the start script taking arguments, I'm not sure about the hanging but it will take much more work on my part. Let me see if I can help with arguments, since that seems to be the root of many of your troubles.
BootStrap: docker
From: centos:latest
# sudo singularity build container.simg Singularity
# singularity start container.simg myinstance hello moto
%labels
test "test-label"
%startscript
echo "The arguments are $@"
And you can see how I built the image,
sudo singularity build container.simg Singularity
If you report this not working, I don't expect it to either. But let's try anyway.
singularity start container.simg myinstance hello moto
I don't see any output. But maybe it's an issue that the echo is happening in the container's namespace (not back on the host)?
So let's try doing on that echos to a file that we could see! I'm adjusting the build slightly:
singularity instance.stop -a
BootStrap: docker
From: centos:latest
# sudo singularity build container.simg Singularity
# singularity start container.simg myinstance hello moto
%labels
test "test-label"
%startscript
echo "The arguments are $@" >> /tmp/tacomoco.txt
And now build , run again...
sudo singularity build container.simg Singularity
singularity instance.start container.simg myinstance hello moto
Do we have the file? We do! But are there arguments (should be hello moto) NOPE.
$ cat /tmp/tacomoco.txt
The arguments are
Trying again, here is the new Singularity recipe
BootStrap: docker
From: centos:latest
# sudo singularity build container.simg Singularity
# singularity start container.simg myinstance hello moto
%labels
test "test-label"
%startscript
echo "The arguments are ${ARRGS}" >> /tmp/tacomoco.txt
And we feel like pirates today. SO we have Arrrgs!
singularity instance.stop -a
sudo singularity build container.simg Singularity
but to start, let's instead place the args in front of the process
$ SINGULARITYENV_ARRGS="hello moto" singularity instance.start container.simg myinstance
vanessa@vanessa-ThinkPad-T460s:/tmp$ cat /tmp/tacomoco.txt
The arguments are
The arguments are hello moto
There you go! It's a hack, but it's workable. You would want to define the variables in your startscript, and then export the variable in python prefixed with SINGULARITYENV_
. If you have issues with os.environ["key"]
don't forget there is also os.putenv
Want to give that a try?
it works, Effectively attempt 1, it did not work.
is something strange but, :) this allows me to use the instances much better, because before, I would only raise an instance without any value in %startscript and then with
singularity run instance://test [args]
would send the arguments to %runscript
Thanks for your patience, I'll check with python
the only bad thing is that the temporary directory is filled with files, which are not deleted when an instance ends. :| ...
.............................Sep 21 18:23 container.simg.myinstance.singularity-debug.ZjLSpb
-rw------- 1 root root 0 Sep 21 18:23 container.simg.myinstance.stderr.Inceuu
-rw------- 1 root root 0 Sep 21 18:23 container.simg.myinstance.stdout.C9l3WP
It would be very useful to improve the instance.stop command to eliminate those residual files
Woohoo! Just a heads up I'm going to be starting making dinner soon, so likely I'll help you at some ungodly hour in the middle of the night, or sometime tomorrow.
@al3x609 that's an issue to deal with singularity, not spython. I experienced the same frustration with general control of instances and these temporary files, so I'd suggest any of the following:
These are suggestions for giving the user control for temporary files, which isn't currently done afaikt.
singularity instance.start container.simg web --nostderr --nostdout
singularity instance.start container.simg web --nostd # implies both
singularity instance.start container.simg web --noall # disable all
/dev/null
) singularity instance.start container.simg web --stdout=/dev/null
there might be an equivalent %exitscript
that handles cleanup. This could be done automatically, or if done by the user, variables should be provided by Singularity native for an instance that points to its specific filest. Something like:
SINGULARITY_INSTANCE_STDERR
: designates path to the error logSINGULARITY_INSTANCE_STDOUT
: designates path to the output logSINGULARITY_INSTANCE_PIDFILE
: designates path to the pid fileSINGULARITY_INSTANCE_PID
: the pid itselfand then the %exitscript
%exitscript
echo "Stopping ${SINGULARITY_INSTANCE_PID}"
cp ${SINGULARITY_INSTANCE_STDERR} /tmp/logs/archive/
rm ${SINGULARITY_INSTANCE_STDERR}
So the above variables would need to be provided for the user.
Another clear need is an ability to get status, or define custom with minimal work. I would want to be able to control a script that a user can get status of my service. For example:
singularity instance.status instance://web
and then without anything, the default would be to echo the content of -
SINGULARITY_INSTANCE_STATUS
which could do something simple like check if the startscript process is running or not. pseudocode would be:
if startscript is running:
SINGULARITY_INSTANCE_STATUS=started
else
SINGULARITY_INSTANCE_STATUS=stopped
export SINGULARITY_INSTANCE_STATUS
But the user could define a custom recipe to also derive this variable (and override the default). The script gets called when the user asks for a status. In my recipe:
%status
if my custom logic passes:
SINGULARITY_INSTANCE_STATUS=custom
else if something else
SINGULARITY_INSTANCE_STATUS=pancaketime
else
SINGULARITY_INSTANCE_STATUS=crapitsmessedup
export SINGULARITY_INSTANCE_STATUS
It's this kind of control / specificity that is needed for services I think! Feel free to open an issue and link this issue comment (and mention me) as I'd be interested to weigh in on the conversation.
the real test
#!/usr/bin/env python3.7
# -*- coding: utf-8 -*-
import os
from spython.main import Client as cl
def run_cont(app_name, end_time, vnc_display):
os.environ['SINGUARITYENV_APP'] = app_name
os.environ['SINGUARITYENV_ENDTIME'] = end_time
os.environ['SINGUARITYENV_DISP'] = vnc_display
instance = cl.instance(
image='/data/singularity/containers/paraview_5.5.2.simg',
name='test_instance',
sudo=False,
options=['--nv']
)
if __name__ == '__main__':
run_cont(app_name='paraview', end_time='2018-09-21T21:07:00', vnc_display='1')
my recipe
BootStrap: localimage
From: /data/singularity/containers/base.1.0.simg
%labels
name "ParaView-5"
%environment
export PATH=/opt/ParaView/bin:${PATH}
%files
entrypoint.sh /opt
%post
export PARAVIEW_URL="https://www.paraview.org/paraview-downloads/download.php?submit=Download&version=v5.5&type=binary&os=Linux&downloadFile=ParaView-5.5.2-Qt5-MPI-Linux-64bit.tar.gz"
cd /opt
chmod +x entrypoint.sh
# .... install paraview dependences .....
yum --disablerepo=epel install -y \
qt5-qtbase-common.noarch
# Paraview installation
wget --no-check-certificate ${PARAVIEW_URL} -O t.tar.gz
tar -xvf t.tar.gz
mv ParaView-5.5.2-Qt5-MPI-Linux-64bit ParaView
# Clean Section
rm t.tar.gz
yum clean all
rm -rf /var/cache/yum
%startscript
exec /opt/entrypoint.sh ${SINGUARITYENV_APP} ${SINGUARITYENV_ENDTIME} ${SINGUARITYENV_DISP} > /dev/null 2>&1 &
As you can see, the python script works but, it does not release the terminal,
hey @al3x609 any variables that you want passed into the container MUST start with SINGUARITYENV_
don't see that you did that anywhere here?
ok , :( Yes, I'm sorry, I forgot it, I fixed it, and I create the image again, but the result is the same,
I update the comments with the changes
Can you please provide a way to get / build the paraview image?
(or a simpler example works too, if it's top secret)
And also please provide the example of running the container in the same way on the host, minus the spython bit, so I can see the expected (correct) output. Thanks!
:) not top secret. the imagen base
BootStrap: docker
From: centos:latest
%labels
name "Base Imagen"
%environment
export PATH=/opt/TurboVNC/bin:/opt/VirtualGL/bin:${PATH}
export LANG=en_US.UTF-8
export LANGUAGE=en_US:en
export LC_ALL=en_US.UTF-8
export TZ=America/Bogota
%post
export TURBOVNC_URL="https://sourceforge.net/projects/turbovnc/files/2.1.90%20%282.2beta1%29/turbovnc-2.1.90.x86_64.rpm"
export EPEL_URI="http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm"
export VIRTUALGL_RPM="https://sourceforge.net/projects/virtualgl/files/2.6/VirtualGL-2.6.x86_64.rpm"
export TZ="America/Bogota"
cd /opt
# deps for noVNC
yum -y update; yum clean all
yum install -y \
ca-certificates \
wget \
libXt \
libSM \
deltarpm \
urw-fonts \
wget \
dbus-x11 \
which
# settings turbovnc
wget --no-check-certificate ${TURBOVNC_URL}
yum install -y \
xauth \
xorg-x11-xkb-utils.x86_64 \
Libxkbcommon-x11 \
xkeyboard-config \
turbovnc-2.1.90.x86_64.rpm
# settings openbox
wget --no-check-certificate ${EPEL_URI}
rpm -ivh epel-release-latest-7.noarch.rpm
sed -i "s/#baseurl/baseurl/" /etc/yum.repos.d/epel.repo
sed -i "s/metalink/#metalink/" /etc/yum.repos.d/epel.repo
yum -y update
yum --enablerepo=epel -y install openbox
# configure locales
dbus-uuidgen > /etc/machine-id
ln -snf /usr/share/zoneinfo/${TZ} /etc/localtime
echo ${TZ} > /etc/timezone
# configuration VirtualGL
curl -SL ${VIRTUALGL_RPM} -o VirtualGL-2.6.x86_64.rpm
yum -y --nogpgcheck localinstall VirtualGL-2.6.x86_64.rpm
/opt/VirtualGL/bin/vglserver_config -config +s +f -t
# Clean Section
rm epel-release-latest-7.noarch.rpm
rm turbovnc-2.1.90.x86_64.rpm
rm VirtualGL-2.6.x86_64.rpm
rm -r /usr/share/info/*
rm -r /usr/share/man/*
rm -r /usr/share/doc/*
yum clean all
rm -rf /var/cache/yum
okay, building!
sudo singularity build paraview.simg Singularity
okay built! What is the command I can run on the host to get a "successful" run?
This starts (and returns to the command line) okay for me, is this enough to test the case?
export SINGULARITYENV_APP=paraview
export SINGULARITYENV_ENDTIME="2018-09-21T20:56:00"
export SINGULARITYENV_DISP=1
singularity instance.start paraview.simg test_instance
yea, it works
hmm that's strange, the first time I totally forgot to create entrypoint.sh
and it didn't show an error - I'm rebuilding now.
Where are you seeing that output? I don't see anything printed to the console.
with ps aux
, and pstree -p
Ahh another derp for me, I only built the base! That explains the entrypoint.sh. Apologies, it's after 11:30pm here and I've well turned into a pumpkin hours ago :P I'm still wanting to try this though!
the entrypoint.sh ends the processes raised so that, after the time limit expires, only remains
but the prompt with spython is not released.
forgive me, you do not have to try now, here in my country Colombia is only one hour apart,: P, if you want we leave it for later
okay running from command line still didn't work to see that output, so I'm shelling into the instance to hit the entrypoint manually. First, confirm the environment variables:
Singularity paraview.simg:/opt> echo $APP
paraview
Singularity paraview.simg:/opt> echo $DISP
1
Singularity paraview.simg:/opt> echo $ENDTIME
2018-09-21T20:56:00
Next, run the command
/opt/entrypoint.sh ${APP} ${ENDTIME} ${DISP}Singularity paraview.simg:/opt> echo $APP
paraview
Singularity paraview.simg:/opt> echo $ENTRYPOINT
Singularity paraview.simg:/opt> echo $DISPLAY
:1
Singularity paraview.simg:/opt> echo $DISP
1
Singularity paraview.simg:/opt> echo $ENDTIME
2018-09-21T20:56:00
Singularity paraview.simg:/opt> env | grep SCIF
Singularity paraview.simg:/opt> /opt/entrypoint.sh ${APP} ${ENDTIME} ${DISP}
29
30
(EE)
Fatal server error:
(EE) Could not create server lock file: /tmp/.X1-lock
(EE)
Invalid MIT-MAGIC-COOKIE-1 key[VGL] ERROR: Could not open display :0.
Openbox-Warning: Openbox is configured for 4 desktops, but the current session has 1. Overriding the Openbox configuration.
^[^[^[^[...Session Expierd...
Terminated
Singularity paraview.simg:/opt>
You just totally crashed my computer. This has only happened before with a Docker. You can either give me a simple test case to reproduce and I will debug for you, or else you’re on you own, dude! I must protect my computer, so i wont be using this paraview thing again.
(EE) Could not create server lock file: /tmp/.X1-lock
because write failure in / tmp,
and
(EE)
Invalid MIT-MAGIC-COOKIE-1 key[VGL] ERROR: Could not open display :0.
you need a real X server running on your machine for hardware rendering.
Then I cannot debug this use case, I cannot meet these special requirements. Perhaps someone else in the community with this setup can help. Good luck!
ok, I'll put together a simpler case, I'm sorry, thank you very much.
Sounds good! I greatly appreciate it.
Since you helped me correct the problem of the input parameters, they are no longer necessary. :) the test recipe
BootStrap: docker
From: nginx
%startscript
service nginx start
then in ipython,
As you can see, it does not release the terminal. and I do not know why.
I apologize for the length of the previous case, I just wanted to show you, as it was the case that I needed to run, and the importance of releasing the terminal, is because, the user in the cluster must request a SLURM JOB, and in the node as not it releases the terminal, the other users can not ask for the same service, they remain in queue, although, squeue slurm says that the JOB is already running.
the example take it out of the singularity website.
perform a tracing test with pdb, for the last test code sent to you.
(Pdb) p cmd
['singularity', 'instance.start', '/data/singularity/containers/test/nginx.simg', 'milky_gato_2427']
(Pdb) s
> /usr/local/lib/python3.7/site-packages/spython/instance/cmd/start.py(68)start()
-> output = run_command(cmd, sudo=sudo, quiet=True)
(Pdb) l
63
64 # Save the options and cmd, if the user wants to see them later
65 self.options = options
66 self.cmd = cmd
67
68 -> output = run_command(cmd, sudo=sudo, quiet=True)
69
70 if output['return_code'] == 0:
71 self._update_metadata()
72
73 else:
(Pdb) n
> /usr/local/lib/python3.7/site-packages/spython/utils/terminal.py(135)run_command()
-> for line in process.communicate():
(Pdb) l
130 stdout = stdout)
131 lines = ()
132 found_match = False
133 print('hoa mundo')
134 breakpoint()
135 -> for line in process.communicate():
136 if line:
137 if isinstance(line, bytes):
138 line = line.decode('utf-8')
139 lines = lines + (line,)
140 if re.search(no_newline_regexp, line) and found_match is True:
(Pdb) n
and stop here, to next var output no defind when checking the [thread exit] (https://github.com/singularityhub/singularity-cli/blob/871132931706803036ee8df26a9e26774b8e690e/spython/utils/terminal.py#L134)
the last value from process.communicate()
(Pdb) p process.communicate()
*** ValueError: Invalid file object: <_io.BufferedReader name=5>
(Pdb) p process.communicate()
*** ValueError: Invalid file object: <_io.BufferedReader name=5>
(Pdb) l
130 stdout = stdout)
131 lines = ()
132 found_match = False
133 print('hoa mundo')
134 breakpoint()
135 -> for line in process.communicate():
136 if line:
137 if isinstance(line, bytes):
138 line = line.decode('utf-8')
139 lines = lines + (line,)
140 if re.search(no_newline_regexp, line) and found_match is True:
(Pdb) s
KeyboardInterrupt
> /usr/local/lib/python3.7/site-packages/spython/instance/cmd/start.py(68)start()
-> output = run_command(cmd, sudo=sudo, quiet=True)
(Pdb) p output
*** NameError: name 'output' is not defined
and here an error is generated similar to the one we had already dealt with in a previous case.
Reproduced woohoO! Okay here is what I did to debug. First, since we know the error is when the instance is started (start is True by default) I ran the command with start set to False. This worked okay to return to the console line:
(note that the "Instance" is the same thing that is at Client.instance(
ins = Instance('nginx.simg',start=False)
Give him a good name! (this usually also happens when client wraps
name=RobotNamer().generate()
# 'delicious-hippo-2497'
ins.name=name.replace('-','_')
ins
instance://delicious_hippo_2497
Now we can look at instance --> cmd --> start where the start functions are. These are where we are hanging so we can debug interactively. The instance has already an associated image:
ins._image
'nginx.simg'
Initialize command
cmd = init_command(ins,'instance.start')
cmd
['singularity', 'instance.start']
add together with no special options
options=[]
cmd = cmd + options + [ins._image, ins.name]
['singularity', 'instance.start', 'nginx.simg', 'delicious_hippo_2497']
This should hang
output = run_command(cmd)
(hangs)
but what if we...
output = run_command(cmd,capture=False)
Doesn't hang :)
So the fix is to not try to capture output, BUT still give the user a handle to this if for some reason output capturing is needed (it will likely hang).
Here you go! https://github.com/singularityhub/singularity-cli/pull/65 Test away!
The difference (what the boolean capture sets) is this:
stdout = None
if capture is True:
stdout = subprocess.PIPE
It's useful if you need to get the output (e.g., a run or similar) but if anything else, I think it makes sense to hang because it's waiting... for godot.
Issue fixed in https://pypi.org/project/spython/0.0.44/, closed and thank you @al3x609 for the very detailed debugging! It was essential for reproducing and then finding the fix. Paraview... awaaaaaay!
Thank you for your time and and your prompt response of technical support to the issues. :)
Expected Behavior
by example, without spython this executes the %runscript in my container with args
it release the terminal..waiting for more commands $
Actual Behavior
the terminal in python is not released after running a command on an instance.
Steps to Reproduce
Context