Closed 0ctagon closed 1 year ago
I have had a WIP PR in place at #162 which could fix this issue. I regret the design decision of providing a gbasf2_install_directory
setting, which is actually only used for finding the setup-file. It would have been better to just provide a setting with the path to the setup-file, which #162 does. I just didn't merge that old PR because I didn't want to deprecate the old setting and break backwards-compatibility...
The error below is due to the HOME
environment variable not being set, this is why os.environ["HOME"]
doesn't work. In b2luigi I often use my run_with_gbasf2 which calls gbasf2 commands with a gbasf2 environment, even though b2luigi runs in a basf2/python3 environment. It provides the environment as a dictionary, but it can be that the HOME
variable is not set in that temporary environment. In the passed that never caused problems, but maybe the new dirac/gbasf2 tools now use that variable. Just a guess, can't really test that as I'm in the last weeks of my Phd and not sure if my grid access still works. if it's a hotfix I might try that but would ask you to test that...
Thank you for your response!
I don't know if it's a hotfix or not, but I would be happy to help.
After some testing I changed the path to gbasf2_install_directory, "BelleDIRAC/Belle-KEK.v5r7/BelleDIRAC/gbasf2/tools/setup.sh"
and the script managed to correctly submit the jobs, without the HOME
error.
But then, when it tries to check the status of the jobs, the worker crashes:
<=====v5r7=====>
JobID = 324794689 ... 324794888 (200 jobs)
INFO: Worker Worker(salt=5862976996, workers=500, host=ccw01.cc.kek.jp, username=tfilling, pid=115331) was stopped. Shutting down Keep-Alive thread
Traceback (most recent call last):
File "b2luigi_gridSubmitDL.py", line 99, in <module>
main()
File "b2luigi_gridSubmitDL.py", line 94, in main
b2luigi.process([main_task_instance], workers=n_gbasf2_tasks,
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/b2luigi/cli/process.py", line 113, in process
runner.run_local(task_list, cli_args, kwargs)
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/b2luigi/cli/runner.py", line 46, in run_local
run_luigi(task_list, cli_args, kwargs)
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/b2luigi/cli/runner.py", line 62, in run_luigi
luigi.build(task_list, **kwargs)
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/luigi/interface.py", line 239, in build
luigi_run_result = _schedule_and_run(tasks, worker_scheduler_factory, override_defaults=env_params)
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/luigi/interface.py", line 173, in _schedule_and_run
success &= worker.run()
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/luigi/worker.py", line 650, in __exit__
if task.is_alive():
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/b2luigi/batch/processes/__init__.py", line 135, in is_alive
job_status = self.get_job_status()
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/b2luigi/batch/processes/gbasf2.py", line 282, in get_job_status
job_status_dict = get_gbasf2_project_job_status_dict(
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/retry/api.py", line 90, in retry_decorator
return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/retry/api.py", line 35, in __retry_internal
return f()
File "/home/belle2/tfilling/.local/lib/python3.8/site-packages/b2luigi/batch/processes/gbasf2.py", line 1021, in get_gbasf2_project_job_status_dict
return json.loads(job_status_json_string)
File "/cvmfs/belle.cern.ch/el7/externals/v01-11-01/Linux_x86_64/common/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/cvmfs/belle.cern.ch/el7/externals/v01-11-01/Linux_x86_64/common/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/cvmfs/belle.cern.ch/el7/externals/v01-11-01/Linux_x86_64/common/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I can pull #162 and try if it works there is no easy fixes for the error below.
I didn't know about the run_with_gbasf2
function, I use the a simple Basf2PathTask
with a create_path
and an output
function, and it worked perfectly before this gbasf2 update.
main
. I just rebased it on the latest branch, but not sure if that works, it's really WIP. I'm still working on it a bit.run_with_gbasf2
is used internally by b2luigi a lot for submitting, monitoring and downloading gbasf2 jobs. It is basically used to run gb2_...
commands which you would do from the terminal, and it ensures to run in an environment which you would get from source /cvmfs/belle.kek.jp/grid/gbasf2/pro/setup.sh
or whatever.
Regarding the json error, it might well be that something else changed. As said I don't have much time to work on that, this is basically just a hobby of mine, not a service task. But PR's are welcome. For debugging it's helpful that you can run most functions that the b2luigi gbasf2 wrapper calls internatively in ipython to make sure they work. E.g. for checking the job status it runs get_gbasf2_project_job_status_dict
, which you can test with
from b2luigi.batch.processes import get_gbasf2_project_job_status_dict
print(get_gbasf2_project_job_status_dict("<project_name>")
in #162 I changed most commands to take the setup path as a command parameter, if you try that branch you should used
print(get_gbasf2_project_job_status_dict(
"<project_name>",
gbasf2_setup_path="/cvmfs/belle.kek.jp/grid/gbasf2/pro/setup.sh"
))
or something like that.
I just thought about trying the latest gbasf2 and it doesn't even work in the terminal anymore because the latest gbasf2 setup-tools use non-posix-conform parameters in their shell-script which brakes it for zsh, seems they didn't run shellcheck on their script...
EDIT: Okay I just saw on the comp-uses-forum
mailinglist that zsh is not supported. Okay, then forget the previous comment.
Anyway, I think one problem is that the setup script now automatically tries to initialize the dirac proxy, and this requires user input. Before it just setup the environment but didn't require input. And the b2luigi gbasf2 wrapper doesn't expect that. The proxy initialization also is the part that requires the $HOME
to tell gbasf2 where to look for the .globus files.
So we need to refactor the whole get_basf2_env
function:
https://github.com/nils-braun/b2luigi/blob/b61f94864fbf8d459fad6e754627739a03569e11/b2luigi/batch/processes/gbasf2.py#L1167-L1195
The command
echo_gbasf2_env_command = shlex.split(
f"env -i bash -c '{gbasf2_setup_command_str} > /dev/null && env'"
)
Is used to source the setup.sh
file from a new bash terminal with an empty environment and then print the resulting environment. I use that to get an environment-dict that I can give to subprocess.call
. Maybe instead of env -i
which starts a process with an empty environment we should use something like
echo_gbasf2_env_command = shlex.split(
f"env -i HOME={os.getenv['HOME']} bash -c '{gbasf2_setup_command_str} > /dev/null && env'"
)
And then below, when running that command, we ideally need a way to deal with potential password queries...
I don't think I have time to work on that, I'm describing this so that others can help with a PR, which is why I added the help wanted
label...
Hi @meliache, I can test the code or help with PR, if needed
@Bilokin Help would be great and very appreciated.
I've now just merged the PR #162 which adds a gbasf2_setup_path
setting, which will now be preferred over gbasf2_install_directory
, but I haven't made that a tagged and published release yet, as there seem to be other issues remaining and I don't really have time to investigate what they are and what all changed. I'm not sure if the missing HOME
is a problem, or the proxy is failing, or json response for getting the job status changed, that might require some digging.
If I had the time, I would first interactively test the different gbas2 utility functions in b2luigi, e.g. from IPython, and try to find out what the issue is, e.g.
# import different utility functions from b2luigi/batch/processes/gbasf2.py
from b2luigi.batch.processes.gbasf2 import (
get_gbasf2_env,
get_gbasf2_project_job_status_dict,
run_with_basf2,
...
)
# test utility functions
print(get_gbasf2_env(gbasf2_setup_path="/cvmfs/belle.kek.jp/grid/gbasf2/pro/setup.sh"))
# requires a running project to test
print(get_gbasf2_project_job_status_dict("<project_name>", gbasf2_setup_path="/cvmfs/belle.kek.jp/grid/gbasf2/pro/setup.sh"))
Yesterday I checked get_gbasf2_env
and it seems to work. It printed an error message but continued regardless. With that, run_with_gbasf2
should be able to run gb2 executables. E.g. for checking the project status it's important that getting a dictionary of job statuses works via get_gbasf2_project_job_status_dict
, though that function requires the name of an existing gbasf2 project, which was why I didn't test it yet.
Of course you can also just insert some printouts and then some b2luigi scripts but that I find more difficult to debug...
I checked get_gbasf2_project_job_status_dict
, it asked for my certificate password repeatedly
Edit: This is because it fails to retrieve the job status dict and attempts it 4 times, each asking for a password. The problem seems to be within importlib_resources??
:
Traceback (most recent call last):
File "/nfs/dust/belle2/user/hohmann/b2luigi/b2luigi/b2luigi/batch/processes/gbasf2_utils/gbasf2_job_status.py", line 19, in <module>
from BelleDIRAC.gbasf2.lib.job.information_collector import InformationCollector
File "/cvmfs/belle.kek.jp/grid/BelleDIRAC/5.7.0/BelleDIRAC/gbasf2/lib/job/information_collector.py", line 15, in <module>
from DIRAC import S_OK
File "/cvmfs/belle.kek.jp/grid/BelleDIRAC/5.7.0/DIRAC/__init__.py", line 212, in <module>
from DIRAC.Core.Utilities.Network import getFQDN
File "/cvmfs/belle.kek.jp/grid/BelleDIRAC/5.7.0/DIRAC/Core/Utilities/Network.py", line 20, in <module>
from DIRAC.Core.Utilities.ReturnValues import S_OK, S_ERROR
File "/cvmfs/belle.kek.jp/grid/BelleDIRAC/5.7.0/DIRAC/Core/Utilities/ReturnValues.py", line 18, in <module>
from DIRAC.Core.Utilities.DErrno import strerror
File "/cvmfs/belle.kek.jp/grid/BelleDIRAC/5.7.0/DIRAC/Core/Utilities/DErrno.py", line 49, in <module>
from DIRAC.Core.Utilities.Extensions import extensionsByPriority
File "/cvmfs/belle.kek.jp/grid/BelleDIRAC/5.7.0/DIRAC/Core/Utilities/Extensions.py", line 16, in <module>
import importlib_resources
File "/cvmfs/belle.kek.jp/grid/diracos2/2.31/Linux-x86_64/diracos/lib/python3.9/site-packages/importlib_resources/__init__.py", line 3, in <module>
from ._common import (
File "/cvmfs/belle.kek.jp/grid/diracos2/2.31/Linux-x86_64/diracos/lib/python3.9/site-packages/importlib_resources/_common.py", line 52
def files(anchor: Optional[Anchor] = None) -> Traversable:
^
SyntaxError: invalid syntax
@MarcelHoh @meliache, so it seems the dirac scripts have been converted to python3, and the format of the X509Chain objects/certificates is a bit different, so CertEncoder
in gbasf2_proxy_info.py
is not correctly converted to JSON.
After changing to #!/usr/bin/env python3
in the script I have a few JSON parsing errors of single quotes, capitalized True
/ False
and a raw <X509Chain 3 certs with key>
object.
My feeling is that there is a conversion or parsing option which is missing somewhere, that would fix this bunch of issues.
Using the below json encoder in both gbasf2_proxy_info.py
and gbasf2_job_status.py
I think fixes the new issues (that X509.dumpAllToString()
actually gives bytes
objects and that the job status now returns datetime
objects).
class Gbasf2ResultJsonEncoder(json.JSONEncoder):
"""
JSON encoder for data structures return by gbasf2.
"""
def default(self, obj):
if isinstance(obj, X509Chain):
x509dict = obj.dumpAllToString()
x509dict['Value'] = x509dict['Value'].decode()
return x509dict
elif isinstance(obj, (datetime.date, datetime.datetime)):
return obj.isoformat()
return json.JSONEncoder.default(self, obj)
I haven't tried running an actual project yet.
I also tried to play around with keyring to store my certificate password but it looks like naf does not have the dbus backend needed.
I submitted a small project with b2luigi. The project was submitted successfully, the job status was displayed as before and the files were downloaded. I can open a PR tomorrow morning with the fix (if none of you get there first :P).
I'll continue discussion regarding the solution to this in the PR #197, which seems good to me so far without having tested it.
I'll use this issue for some chitchat not really related to the issue title:
@Bilokin
so it seems the dirac scripts have been converted to python3
I just checked and they use python3.9
for gbasf2 now, nice. Now it would like it's possible now to install and run b2luigi in the gbasf2 environment and call the gbasf2/DIRAC API directly and not through subprocesses. The problem is that it's still a separate environment from the basf2 one. Not sure if we will ever get to a point where both run in the same terminal/environment, I don't know if that is a goal, but we're much closer to that. That could make a lot of complexity in b2luigi obsolete.
https://github.com/nils-braun/b2luigi/blob/7e43a8f2e45afafc800cc7304a71207ca3a523a9/b2luigi/batch/processes/gbasf2.py#L1099
In the new gbasf2 release v5r7, the default path changed from:
to (no need to
proxy_init
):I tried to modify the line to
gbasf2_setup_path = os.path.join(gbasf2_install_directory, "gbasf2/pro/setup.sh")
but then get a crash with error: