singularityhub / singularity-cli

streamlined singularity python client (spython) for singularity
https://singularityhub.github.io/singularity-cli/
Mozilla Public License 2.0
57 stars 31 forks source link

How to run a instance command in background? #64

Closed al3x609 closed 6 years ago

al3x609 commented 6 years ago

Expected Behavior

by example, without spython this executes the %runscript in my container with args

$singularity instance.run image.img --nv test
$singularity run instance://test [args...]

it release the terminal..waiting for more commands $

Actual Behavior

the terminal in python is not released after running a command on an instance.

Steps to Reproduce

>>>from spython.main import Client as cl
>>>my= cl.instances('test')
>>>cl.run(my,[..args..])

Context

vsoch commented 6 years ago

This likely is a bug! Could you give me the exact version of singularity (master branch?) along with the recipe for the image you are testing? I can help and debug if I can reproduce this.

al3x609 commented 6 years ago

hi, the error also scales at the time of creating the instance from python. spython 0.0.43

recipe test:

BootStrap: docker
From: centos:latest
%labels
 test "test-label"

image

does not release the python's prompt

this happens with ipython and with spython shell too

but the process is created.

image

vsoch commented 6 years ago

From your recipe, you are doing "run" to an instance that doesn't have a declared runscript (based on your recipe) and so the default would be to call an interactive shell. In the terminal you would go into that shell. In spython, you hang because the shell is another process and it doesn't go in. So - you perhaps want to better identify first what you are trying to do. if you want an interactive shell, doing this from spython doesn't make sense. If you want to truly run a command, then you need to define what it is in the runscript. if you want to quickly "send" a command then do exec. The behavior that I see above is expected given this observation. Let me know what you are trying to do and I can offer suggestions for how to help you achieve that.

al3x609 commented 6 years ago

Look what happens is that I need to run several services inside a container, I understand according to your advice in previous conversations, that I must use "instances", now, these services are raised in a script [.sh] within the %runscript section, but I must send some parameters to that script, therefore, as I understood the singularity manual for the instance I must put my script in the %startscript section, but if I do this, I can not pass parameters to it, it does not take them, I have already asked several times how to send parameters to the %startscript section in several places without an answer.

My intention is through python to lift those containers with the spython API, this script will be launched with SLURM. The error that I mentioned before, also happens if I define a %runscript, it is also hung.

vsoch commented 6 years ago

I'm sorry that the singularity maintainers have not been responsive to you - I no longer wear that hat (I am lead developer for several supporting software but Sylabs is now in charge of Singularity itself) and I don't follow development beyond releases. I will help with the issue of the start script taking arguments, I'm not sure about the hanging but it will take much more work on my part. Let me see if I can help with arguments, since that seems to be the root of many of your troubles.

BootStrap: docker
From: centos:latest
# sudo singularity build container.simg Singularity
# singularity start container.simg myinstance hello moto
%labels
    test "test-label"
%startscript
    echo "The arguments are $@"

And you can see how I built the image,

sudo singularity build container.simg Singularity

Attempt 1: Just start the instance

If you report this not working, I don't expect it to either. But let's try anyway.

singularity start container.simg myinstance hello moto

I don't see any output. But maybe it's an issue that the echo is happening in the container's namespace (not back on the host)?

Attempt 2: Write to file

So let's try doing on that echos to a file that we could see! I'm adjusting the build slightly:

singularity instance.stop -a
BootStrap: docker
From: centos:latest
# sudo singularity build container.simg Singularity
# singularity start container.simg myinstance hello moto
%labels
    test "test-label"
%startscript
    echo "The arguments are $@" >> /tmp/tacomoco.txt

And now build , run again...

sudo singularity build container.simg Singularity
singularity instance.start container.simg myinstance hello moto

Do we have the file? We do! But are there arguments (should be hello moto) NOPE.

$ cat /tmp/tacomoco.txt 
The arguments are 

Attempt 3: Hack with variables passed to container

Trying again, here is the new Singularity recipe

BootStrap: docker
From: centos:latest
# sudo singularity build container.simg Singularity
# singularity start container.simg myinstance hello moto
%labels
    test "test-label"
%startscript
    echo "The arguments are ${ARRGS}" >> /tmp/tacomoco.txt

And we feel like pirates today. SO we have Arrrgs!

singularity instance.stop -a
sudo singularity build container.simg Singularity

but to start, let's instead place the args in front of the process

$ SINGULARITYENV_ARRGS="hello moto" singularity instance.start container.simg myinstance
vanessa@vanessa-ThinkPad-T460s:/tmp$ cat /tmp/tacomoco.txt 
The arguments are 
The arguments are hello moto

There you go! It's a hack, but it's workable. You would want to define the variables in your startscript, and then export the variable in python prefixed with SINGULARITYENV_. If you have issues with os.environ["key"] don't forget there is also os.putenv Want to give that a try?

al3x609 commented 6 years ago

it works, Effectively attempt 1, it did not work.

is something strange but, :) this allows me to use the instances much better, because before, I would only raise an instance without any value in %startscript and then with

singularity run instance://test [args] would send the arguments to %runscript

Thanks for your patience, I'll check with python

al3x609 commented 6 years ago

the only bad thing is that the temporary directory is filled with files, which are not deleted when an instance ends. :| ...

.............................Sep 21 18:23 container.simg.myinstance.singularity-debug.ZjLSpb
-rw------- 1 root    root     0 Sep 21 18:23 container.simg.myinstance.stderr.Inceuu
-rw------- 1 root    root     0 Sep 21 18:23 container.simg.myinstance.stdout.C9l3WP

It would be very useful to improve the instance.stop command to eliminate those residual files

vsoch commented 6 years ago

Woohoo! Just a heads up I'm going to be starting making dinner soon, so likely I'll help you at some ungodly hour in the middle of the night, or sometime tomorrow.

vsoch commented 6 years ago

@al3x609 that's an issue to deal with singularity, not spython. I experienced the same frustration with general control of instances and these temporary files, so I'd suggest any of the following:

Suggestions for Instance Files

These are suggestions for giving the user control for temporary files, which isn't currently done afaikt.

singularity instance.start container.simg web --nostderr --nostdout
singularity instance.start container.simg web --nostd  # implies both
singularity instance.start container.simg web --noall # disable all
singularity instance.start container.simg web --stdout=/dev/null

and then the %exitscript

%exitscript
echo "Stopping ${SINGULARITY_INSTANCE_PID}"
cp ${SINGULARITY_INSTANCE_STDERR} /tmp/logs/archive/
rm ${SINGULARITY_INSTANCE_STDERR}

So the above variables would need to be provided for the user.

Instance Status

Another clear need is an ability to get status, or define custom with minimal work. I would want to be able to control a script that a user can get status of my service. For example:

singularity instance.status instance://web

and then without anything, the default would be to echo the content of -

which could do something simple like check if the startscript process is running or not. pseudocode would be:

if startscript is running:
   SINGULARITY_INSTANCE_STATUS=started
else
   SINGULARITY_INSTANCE_STATUS=stopped
export SINGULARITY_INSTANCE_STATUS

But the user could define a custom recipe to also derive this variable (and override the default). The script gets called when the user asks for a status. In my recipe:

%status
if my custom logic passes:
    SINGULARITY_INSTANCE_STATUS=custom
else if something else
    SINGULARITY_INSTANCE_STATUS=pancaketime
else
    SINGULARITY_INSTANCE_STATUS=crapitsmessedup
export SINGULARITY_INSTANCE_STATUS

It's this kind of control / specificity that is needed for services I think! Feel free to open an issue and link this issue comment (and mention me) as I'd be interested to weigh in on the conversation.

al3x609 commented 6 years ago

the real test

#!/usr/bin/env python3.7
# -*- coding: utf-8 -*-

import os
from spython.main import Client as cl

def run_cont(app_name, end_time, vnc_display):

    os.environ['SINGUARITYENV_APP'] = app_name
    os.environ['SINGUARITYENV_ENDTIME'] = end_time
    os.environ['SINGUARITYENV_DISP'] = vnc_display

    instance = cl.instance(
        image='/data/singularity/containers/paraview_5.5.2.simg',
        name='test_instance',
        sudo=False,
        options=['--nv']
    )

if __name__ == '__main__':
    run_cont(app_name='paraview', end_time='2018-09-21T21:07:00', vnc_display='1')

image

image

al3x609 commented 6 years ago

my recipe

BootStrap: localimage

From: /data/singularity/containers/base.1.0.simg

%labels
  name "ParaView-5"

%environment
  export PATH=/opt/ParaView/bin:${PATH}

%files
 entrypoint.sh /opt

%post

  export PARAVIEW_URL="https://www.paraview.org/paraview-downloads/download.php?submit=Download&version=v5.5&type=binary&os=Linux&downloadFile=ParaView-5.5.2-Qt5-MPI-Linux-64bit.tar.gz"
  cd /opt
  chmod +x entrypoint.sh

  # ....  install paraview  dependences .....
  yum --disablerepo=epel install -y \
      qt5-qtbase-common.noarch

  # Paraview installation
  wget --no-check-certificate ${PARAVIEW_URL} -O t.tar.gz
  tar -xvf t.tar.gz
  mv ParaView-5.5.2-Qt5-MPI-Linux-64bit ParaView

  # Clean Section
  rm t.tar.gz
  yum clean all
  rm -rf /var/cache/yum

%startscript
 exec /opt/entrypoint.sh ${SINGUARITYENV_APP} ${SINGUARITYENV_ENDTIME} ${SINGUARITYENV_DISP} > /dev/null 2>&1   &

As you can see, the python script works but, it does not release the terminal,

vsoch commented 6 years ago

hey @al3x609 any variables that you want passed into the container MUST start with SINGUARITYENV_ don't see that you did that anywhere here?

al3x609 commented 6 years ago

ok , :( Yes, I'm sorry, I forgot it, I fixed it, and I create the image again, but the result is the same,

I update the comments with the changes

vsoch commented 6 years ago

Can you please provide a way to get / build the paraview image?

vsoch commented 6 years ago

(or a simpler example works too, if it's top secret)

vsoch commented 6 years ago

And also please provide the example of running the container in the same way on the host, minus the spython bit, so I can see the expected (correct) output. Thanks!

al3x609 commented 6 years ago

:) not top secret. the imagen base

BootStrap: docker

From: centos:latest

%labels
  name "Base Imagen" 

%environment
  export PATH=/opt/TurboVNC/bin:/opt/VirtualGL/bin:${PATH}
  export LANG=en_US.UTF-8  
  export LANGUAGE=en_US:en  
  export LC_ALL=en_US.UTF-8 
  export TZ=America/Bogota

%post

  export TURBOVNC_URL="https://sourceforge.net/projects/turbovnc/files/2.1.90%20%282.2beta1%29/turbovnc-2.1.90.x86_64.rpm"
  export EPEL_URI="http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm"
  export VIRTUALGL_RPM="https://sourceforge.net/projects/virtualgl/files/2.6/VirtualGL-2.6.x86_64.rpm"
  export TZ="America/Bogota"

  cd /opt 

  # deps for noVNC
  yum -y update; yum clean all 
  yum install -y       \
      ca-certificates  \
      wget             \
      libXt            \
      libSM            \
      deltarpm         \
      urw-fonts        \
      wget             \
      dbus-x11         \
      which          

  # settings turbovnc
  wget --no-check-certificate ${TURBOVNC_URL} 
  yum install -y \
      xauth \
      xorg-x11-xkb-utils.x86_64 \
      Libxkbcommon-x11 \
      xkeyboard-config \
      turbovnc-2.1.90.x86_64.rpm 

  # settings openbox
  wget --no-check-certificate ${EPEL_URI}
  rpm -ivh epel-release-latest-7.noarch.rpm
  sed -i "s/#baseurl/baseurl/" /etc/yum.repos.d/epel.repo
  sed -i "s/metalink/#metalink/" /etc/yum.repos.d/epel.repo
  yum -y update
  yum --enablerepo=epel -y install openbox

  # configure locales
  dbus-uuidgen > /etc/machine-id  
  ln -snf /usr/share/zoneinfo/${TZ} /etc/localtime 
  echo ${TZ} > /etc/timezone 

  # configuration VirtualGL
  curl -SL ${VIRTUALGL_RPM} -o VirtualGL-2.6.x86_64.rpm 
  yum -y --nogpgcheck localinstall VirtualGL-2.6.x86_64.rpm  
  /opt/VirtualGL/bin/vglserver_config -config +s +f -t 

  # Clean Section
  rm epel-release-latest-7.noarch.rpm
  rm turbovnc-2.1.90.x86_64.rpm 
  rm VirtualGL-2.6.x86_64.rpm        
  rm -r /usr/share/info/*            
  rm -r /usr/share/man/*             
  rm -r /usr/share/doc/*             
  yum clean all                      
  rm -rf /var/cache/yum
vsoch commented 6 years ago

okay, building!

sudo singularity build paraview.simg Singularity
vsoch commented 6 years ago

okay built! What is the command I can run on the host to get a "successful" run?

vsoch commented 6 years ago

This starts (and returns to the command line) okay for me, is this enough to test the case?


export SINGULARITYENV_APP=paraview
export SINGULARITYENV_ENDTIME="2018-09-21T20:56:00"
export SINGULARITYENV_DISP=1
singularity instance.start paraview.simg test_instance
al3x609 commented 6 years ago

image

image yea, it works

vsoch commented 6 years ago

hmm that's strange, the first time I totally forgot to create entrypoint.sh and it didn't show an error - I'm rebuilding now.

vsoch commented 6 years ago

Where are you seeing that output? I don't see anything printed to the console.

al3x609 commented 6 years ago

with ps aux, and pstree -p

vsoch commented 6 years ago

Ahh another derp for me, I only built the base! That explains the entrypoint.sh. Apologies, it's after 11:30pm here and I've well turned into a pumpkin hours ago :P I'm still wanting to try this though!

al3x609 commented 6 years ago

the entrypoint.sh ends the processes raised so that, after the time limit expires, only remains image

but the prompt with spython is not released.

forgive me, you do not have to try now, here in my country Colombia is only one hour apart,: P, if you want we leave it for later

vsoch commented 6 years ago

okay running from command line still didn't work to see that output, so I'm shelling into the instance to hit the entrypoint manually. First, confirm the environment variables:

Singularity paraview.simg:/opt> echo $APP
paraview
Singularity paraview.simg:/opt> echo $DISP
1
Singularity paraview.simg:/opt> echo $ENDTIME
2018-09-21T20:56:00

Next, run the command

/opt/entrypoint.sh ${APP} ${ENDTIME} ${DISP}Singularity paraview.simg:/opt> echo $APP
paraview
Singularity paraview.simg:/opt> echo $ENTRYPOINT

Singularity paraview.simg:/opt> echo $DISPLAY
:1
Singularity paraview.simg:/opt> echo $DISP
1
Singularity paraview.simg:/opt> echo $ENDTIME
2018-09-21T20:56:00
Singularity paraview.simg:/opt> env | grep SCIF
Singularity paraview.simg:/opt> /opt/entrypoint.sh ${APP} ${ENDTIME} ${DISP}
29
30
(EE) 
Fatal server error:
(EE) Could not create server lock file: /tmp/.X1-lock
(EE) 
Invalid MIT-MAGIC-COOKIE-1 key[VGL] ERROR: Could not open display :0.
Openbox-Warning: Openbox is configured for 4 desktops, but the current session has 1.  Overriding the Openbox configuration.
^[^[^[^[...Session Expierd...
Terminated
Singularity paraview.simg:/opt> 
vsoch commented 6 years ago

You just totally crashed my computer. This has only happened before with a Docker. You can either give me a simple test case to reproduce and I will debug for you, or else you’re on you own, dude! I must protect my computer, so i wont be using this paraview thing again.

al3x609 commented 6 years ago

(EE) Could not create server lock file: /tmp/.X1-lock because write failure in / tmp,
and (EE) Invalid MIT-MAGIC-COOKIE-1 key[VGL] ERROR: Could not open display :0.

you need a real X server running on your machine for hardware rendering.

vsoch commented 6 years ago

Then I cannot debug this use case, I cannot meet these special requirements. Perhaps someone else in the community with this setup can help. Good luck!

al3x609 commented 6 years ago

ok, I'll put together a simpler case, I'm sorry, thank you very much.

vsoch commented 6 years ago

Sounds good! I greatly appreciate it.

al3x609 commented 6 years ago

Since you helped me correct the problem of the input parameters, they are no longer necessary. :) the test recipe

BootStrap: docker
From:   nginx

%startscript
service nginx start

then in ipython,

image

As you can see, it does not release the terminal. and I do not know why.

al3x609 commented 6 years ago

I apologize for the length of the previous case, I just wanted to show you, as it was the case that I needed to run, and the importance of releasing the terminal, is because, the user in the cluster must request a SLURM JOB, and in the node as not it releases the terminal, the other users can not ask for the same service, they remain in queue, although, squeue slurm says that the JOB is already running.

the example take it out of the singularity website.

al3x609 commented 6 years ago

perform a tracing test with pdb, for the last test code sent to you.


(Pdb) p cmd

['singularity', 'instance.start', '/data/singularity/containers/test/nginx.simg', 'milky_gato_2427']

(Pdb) s
> /usr/local/lib/python3.7/site-packages/spython/instance/cmd/start.py(68)start()
-> output = run_command(cmd, sudo=sudo, quiet=True)
(Pdb) l
 63
 64         # Save the options and cmd, if the user wants to see them later
 65         self.options = options
 66         self.cmd = cmd
 67
 68  ->     output = run_command(cmd, sudo=sudo, quiet=True)
 69
 70         if output['return_code'] == 0:
 71             self._update_metadata()
 72
 73         else:

(Pdb) n
> /usr/local/lib/python3.7/site-packages/spython/utils/terminal.py(135)run_command()
-> for line in process.communicate():
(Pdb) l
130                                    stdout = stdout)
131         lines = ()
132         found_match = False
133         print('hoa mundo')
134         breakpoint()
135  ->     for line in process.communicate():
136             if line:
137                 if isinstance(line, bytes):
138                     line = line.decode('utf-8')
139                 lines = lines + (line,)
140                 if re.search(no_newline_regexp, line) and found_match is True:
(Pdb) n

and stop here, to next var output no defind when checking the [thread exit] (https://github.com/singularityhub/singularity-cli/blob/871132931706803036ee8df26a9e26774b8e690e/spython/utils/terminal.py#L134)

the last value from process.communicate()

(Pdb) p process.communicate()
*** ValueError: Invalid file object: <_io.BufferedReader name=5>
(Pdb) p process.communicate()
*** ValueError: Invalid file object: <_io.BufferedReader name=5>
(Pdb) l
130                                    stdout = stdout)
131         lines = ()
132         found_match = False
133         print('hoa mundo')
134         breakpoint()
135  ->     for line in process.communicate():
136             if line:
137                 if isinstance(line, bytes):
138                     line = line.decode('utf-8')
139                 lines = lines + (line,)
140                 if re.search(no_newline_regexp, line) and found_match is True:
(Pdb) s
KeyboardInterrupt
> /usr/local/lib/python3.7/site-packages/spython/instance/cmd/start.py(68)start()
-> output = run_command(cmd, sudo=sudo, quiet=True)
(Pdb) p output
*** NameError: name 'output' is not defined

and here an error is generated similar to the one we had already dealt with in a previous case.

vsoch commented 6 years ago

Reproduced woohoO! Okay here is what I did to debug. First, since we know the error is when the instance is started (start is True by default) I ran the command with start set to False. This worked okay to return to the console line:

(note that the "Instance" is the same thing that is at Client.instance(

ins = Instance('nginx.simg',start=False)

Give him a good name! (this usually also happens when client wraps

name=RobotNamer().generate()
# 'delicious-hippo-2497'
ins.name=name.replace('-','_')
ins
instance://delicious_hippo_2497

Now we can look at instance --> cmd --> start where the start functions are. These are where we are hanging so we can debug interactively. The instance has already an associated image:

ins._image
 'nginx.simg'

Initialize command

cmd = init_command(ins,'instance.start')

cmd
['singularity', 'instance.start']

add together with no special options

options=[]
cmd = cmd + options + [ins._image, ins.name]
['singularity', 'instance.start', 'nginx.simg', 'delicious_hippo_2497']

This should hang

output = run_command(cmd)

(hangs)

but what if we...

 output = run_command(cmd,capture=False)

Doesn't hang :)

So the fix is to not try to capture output, BUT still give the user a handle to this if for some reason output capturing is needed (it will likely hang).

Here you go! https://github.com/singularityhub/singularity-cli/pull/65 Test away!

vsoch commented 6 years ago

The difference (what the boolean capture sets) is this:

    stdout = None
    if capture is True:
        stdout = subprocess.PIPE

It's useful if you need to get the output (e.g., a run or similar) but if anything else, I think it makes sense to hang because it's waiting... for godot.

vsoch commented 6 years ago

Issue fixed in https://pypi.org/project/spython/0.0.44/, closed and thank you @al3x609 for the very detailed debugging! It was essential for reproducing and then finding the fix. Paraview... awaaaaaay!

al3x609 commented 6 years ago

Thank you for your time and and your prompt response of technical support to the issues. :)