riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
96 stars 39 forks source link

Issues with Python interpreter for singularity sandbox #166

Closed alecgunny closed 10 months ago

alecgunny commented 11 months ago

Full reproducing code can be found here, this issue is concerned with issue 1 on that repo's README. In particular, when trying to run code installed at container build time when using a singularity sandbox, the python interpreter, which I believe is the host's python interpreter, is unable to find the corresponding libraries, even when PYTHONPATH is set via sandbox_env or even via sys.path.insert inside the container. Taking the linked example (which includes print statements indicating the desired and actual values of PYTHONPATH), you should see output like:

DEBUG: Checking if Greet(name=Thom) is complete
INFO: Informed scheduler that task   Greet_Thom_af59ec2587   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 2788032] Worker Worker(salt=2417965045, workers=1, host=dgx1, username=alec.gunny, pid=2788032) running   Greet(name=Thom)
Attempting to set PYTHONPATH to: /usr/local/lib/python3.10/site-packages:

=============================== entering sandbox ===============================
task   : Greet_Thom_af59ec2587
sandbox: singularity::app.sif
================================================================================

/home/alec.gunny/.bashrc: line 30: bind: warning: line editing not enabled
/home/alec.gunny/miniconda3/lib/python3.9/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.2.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
DEBUG: Checking if Greet(name=Thom) is complete
INFO: Informed scheduler that task   Greet_Thom_af59ec2587   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Pending tasks: 0
INFO: [pid 2788123] Worker Worker(salt=2417965045, workers=1, host=dgx1, username=alec.gunny, pid=2788032) running   Greet(name=Thom)
ERROR: [pid 2788123] Worker Worker(salt=2417965045, workers=1, host=dgx1, username=alec.gunny, pid=2788032) failed    Greet(name=Thom)
Traceback (most recent call last):
  File "/law_forward/py/luigi/worker.py", line 203, in run
    new_deps = self._run_get_new_deps()
  File "/law_forward/py/luigi/worker.py", line 138, in _run_get_new_deps
    task_gen = self.task.run()
  File "/home/alec.gunny/projects/law-config-repro/law_repro/__init__.py", line 43, in run
    from greeter import greet
ModuleNotFoundError: No module named 'greeter'
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   Greet_Thom_af59ec2587   has status   FAILED
INFO: This progress looks :( because there were failed tasks
Inside container, PYTHONPATH is: /law_forward/py:
Attempting to set PYTHONPATH to: /usr/local/lib/python3.10/site-packages:

=============================== leaving sandbox ================================
task   : Greet_Thom_af59ec2587
sandbox: singularity::app.sif
================================================================================

ERROR: [pid 2788032] Worker Worker(salt=2417965045, workers=1, host=dgx1, username=alec.gunny, pid=2788032) failed    Greet(name=Thom)
Traceback (most recent call last):
  File "/home/alec.gunny/miniconda3/envs/law-repro-DBCx7gBt-py3.9/lib/python3.9/site-packages/luigi/worker.py", line 203, in run
    new_deps = self._run_get_new_deps()
  File "/home/alec.gunny/miniconda3/envs/law-repro-DBCx7gBt-py3.9/lib/python3.9/site-packages/luigi/worker.py", line 138, in _run_get_new_deps
    task_gen = self.task.run()
  File "/home/alec.gunny/miniconda3/envs/law-repro-DBCx7gBt-py3.9/lib/python3.9/site-packages/law/sandbox/base.py", line 350, in run
    raise Exception(
Exception: sandbox 'singularity::app.sif' failed with exit code 40, please see the error inside the sandboxed context above for details
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   Greet_Thom_af59ec2587   has status   FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 1 pending tasks possibly being run by other workers
DEBUG: There are 1 pending tasks unique to this worker
DEBUG: There are 1 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=2417965045, workers=1, host=dgx1, username=alec.gunny, pid=2788032) was stopped. Shutting down Keep-Alive thread
INFO: 
===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 failed:
    - 1 Greet(...)

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

where greeter is the name of a library that got installed at build time that evidently can't be found at run time. If you build the corresponding container from the repro code, it's easy to see that this command should work:

singularity exec app.sif python -c 'from greeter import greet;greet("Thom")'
riga commented 11 months ago

Hey @alecgunny !

which I believe is the host's python interpreter

I think this is due to the absolute path to the python interpreter in the shebang of the law executable.

With #165 included, can you try adding

[singularity_sandbox]
law_executable = "<the_python_to_use_in_the_container> -m law"

to your law.cfg? I even think that "python -m law" should to the trick. If this works, I would make this the default for all container-based sandboxes.

alecgunny commented 11 months ago

Hi @riga , working from the latest branch, I've tried explicitly specifying the Python interpreter to be used, but the issue is that this python interpreter doesn't know anything about law and so can't find it with a simple /usr/local/bin/python -m law:

/bin/bash: line 1: /usr/local/bin/python -m law: No such file or directory

I can try specifying the full path to the mapped-in law executable, where my law.cfg looks like this:

[singularity_sandbox]
allow_binds = true
forward_law = true
law_executable = "/usr/local/bin/python -m /law_forward/py/law/cli/law"

but this doesn't work for some reason either

/bin/bash: line 1: /usr/local/bin/python -m /law_forward/py/law/cli/law: No such file or directory

I guess this probably makes sense anyway: law outside the container was installed without knowing anything about the dependencies of the libraries insides the container, and vice versa, so there's really no reason to expect these to all be compatible at run time (e.g. when law attempts to import anything, it will look for the package installed in the container's site-packages directory first, and there's no guarantee that the correct version will be there).

It seems like the most sensible solution in the short term is to just install law inside all of our containers, that way we'll know it's present and compatible with the rest of the libraries in the environment.

riga commented 11 months ago

This would indeed solve things and be a little more robust against incompatibilities between existing and forward python packages, but I'd also be curious why the forwarding doesn't seem to work in the first place.

Just as a test, could you add

[logging]

law.sandbox.base: DEBUG

to your law.cfg (on the latest master)? This will show you the full sandbox command right after the entering sandbox banner. I suspect that the PYTHONPATH is not properly set.

riga commented 10 months ago

Closing this for now, but feel free to open again to continue the discussion if needed :+1: