Closed MarcelHoh closed 1 year ago
Patch coverage has no change and project coverage change: -0.08
:warning:
Comparison is base (
897a17d
) 60.59% compared to head (9ab7c28
) 60.51%.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Thanks a lot for the PR!
There is a silent prompt for a password from the setup script and then it crashes when parsing JSON in get_gbasf2_project_job_status_dict
.
The entire error message is below:
Traceback (most recent call last):
File "/cvmfs/belle.cern.ch/el7/externals/v01-11-01/Linux_x86_64/common/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/cvmfs/belle.cern.ch/el7/externals/v01-11-01/Linux_x86_64/common/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/gpfs/home/belle2/bilokin/code/syscorrfw/syscorrfw/__main__.py", line 56, in <module>
main()
File "/gpfs/home/belle2/bilokin/code/syscorrfw/syscorrfw/__main__.py", line 46, in main
b2luigi.process([main_task_instance],
File "/gpfs/home/belle2/bilokin/code/test/b2luigi/b2luigi/cli/process.py", line 113, in process
runner.run_local(task_list, cli_args, kwargs)
File "/gpfs/home/belle2/bilokin/code/test/b2luigi/b2luigi/cli/runner.py", line 46, in run_local
run_luigi(task_list, cli_args, kwargs)
File "/gpfs/home/belle2/bilokin/code/test/b2luigi/b2luigi/cli/runner.py", line 62, in run_luigi
luigi.build(task_list, **kwargs)
File "/home/belle2/bilokin/.local/lib/python3.8/site-packages/luigi/interface.py", line 239, in build
luigi_run_result = _schedule_and_run(tasks, worker_scheduler_factory, override_defaults=env_params)
File "/home/belle2/bilokin/.local/lib/python3.8/site-packages/luigi/interface.py", line 173, in _schedule_and_run
success &= worker.run()
File "/home/belle2/bilokin/.local/lib/python3.8/site-packages/luigi/worker.py", line 645, in __exit__
if task.is_alive():
File "/gpfs/home/belle2/bilokin/code/test/b2luigi/b2luigi/batch/processes/__init__.py", line 135, in is_alive
job_status = self.get_job_status()
File "/gpfs/home/belle2/bilokin/code/test/b2luigi/b2luigi/batch/processes/gbasf2.py", line 319, in get_job_status
job_status_dict = get_gbasf2_project_job_status_dict(
File "/home/belle2/bilokin/.local/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/belle2/bilokin/.local/lib/python3.8/site-packages/retry/api.py", line 90, in retry_decorator
return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
File "/home/belle2/bilokin/.local/lib/python3.8/site-packages/retry/api.py", line 35, in __retry_internal
return f()
File "/gpfs/home/belle2/bilokin/code/test/b2luigi/b2luigi/batch/processes/gbasf2.py", line 1100, in get_gbasf2_project_job_status_dict
return json.loads(job_status_json_string)
File "/cvmfs/belle.cern.ch/el7/externals/v01-11-01/Linux_x86_64/common/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/cvmfs/belle.cern.ch/el7/externals/v01-11-01/Linux_x86_64/common/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/cvmfs/belle.cern.ch/el7/externals/v01-11-01/Linux_x86_64/common/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
@Bilokin, strange, do you have a proxy initialised already?
Yes, but I retried after deleting the proxy manually, and I still have the same behaviour. Also I am submitting completely new project.
@Bilokin, in my effort to make my code a bit more legible before the PR I messed up a bracket placement. Hopefully it works now.
After the latest commit it proceeds to the project submission, but then I have a fail because I was submitting it with custom dirac group belle_systematics
.
Now, the gbasf2's setup.sh
calls gb2_proxy_init
command automatically, we need to pass the custom group name here: https://github.com/nils-braun/b2luigi/blob/main/b2luigi/batch/processes/gbasf2.py#L1182
Does it make sense to pass this just as a b2luigi setting? Say gbasf2_group
?
Then I would just add
gbasf2_group = b2luigi.get_setting('gbafs2_group', 'belle')
gbasf2_setup_command_str = f"source {gbasf2_setup_path} -g {gbasf2_group} > /dev/null"
It is already registered as gbasf2_proxy_group
parameter, one can see an usage example here: https://github.com/nils-braun/b2luigi/blob/main/b2luigi/batch/processes/gbasf2.py#L1238
Thanks a lot! Now the job submission works with the custom proxy group.
I think we need to make sure that the other parameters , such as gbasf2_proxy_lifetime
, are passed correctly to the gbasf2 proxy in the new setup file.
And, in the future, one can also avoid the call of gb2_proxy_init
if it is not needed anymore.
I had a look through the /cvmfs/belle.kek./grid/gbasf2/pro/setup.sh
script before as I wanted to add also the lifetime as done in the proxy init. From my quick look, it seems the setup script only supports the options g,C,N,K
:
-C --Cert <value> : File to use as user certificate
-K --Key <value> : File to use as user key
-N --no-upload : Do not upload a long lived proxy to the ProxyManager
Thanks for the good work on this 👍.
And, in the future, one can also avoid the call of
gb2_proxy_init
if it is not needed anymore.
The only issue being when the b2luigi process runs longer than the proxy lifetime. Currently, we have code to try to automatically renew the proxy, but I admit that wasn't that reliable anyway. But it would be nice if it were possible to get rid of all that code, which adds complexity and is an entry point for bugs.
Some annoying thing is that the gbasf2 setup.sh
asks for the certificate password on any invocation even if there's still an alive initialized DIRAC proxy. b2luigi was a bit smarter in that it checked the lifetime of the existing proxy first by checking the programmatic equivalent of gb2_proxy_info
.
b2luigi uses the function get_basf2_env
a lot, in every invocation of run_with_gbasf2
. That function returns the environment variables that the setup.sh
sets as a dictionary. For that it sources that file, but luckily it only needs to do that once, because I decorated it with @lru_cache
, therefore it saves that dictionary for the runtime of the python process. But still it means that if you run a b2luigi/gbasf2 steering file multiple times in a day, e.g. python gb2_tasks.py
now you'll have to enter the password every time, whereas in the past once per proxy lifetime was enough.
But that could be a gbasf2 feature request. I would like for the setup.sh to only initialize the proxy if it's not alive, and just offer a cli parameter to force proxy re-initialization. If I had a wishlist, I'd also like a parameter to force the old behavior to turn off automatic initialization. Maybe if some of you have time, you could contribute to the BelleDirac gitlab repo to implement that...
There is another issue linked to the custom proxy group: The gb2_ds_get
command with --new
argument downloads files into a different folder than expected by b2luigi.
Does it work well for the standard belle
proxy?
I tried to add -o
option to specify the output path but it still creates a weird folder structure.
I think one can remove --new
if the group name is not belle
and then put it back when the downloaded folder structure will be the same for all proxy groups.
I've removed the --new
flag if the proxy group is not belle
.
While there are still several issues caused by the changes to gbasf2 (mainly the password requests), I suggest we merge this now to have at least a reasonably working version available.
The other option is to use the python2 client at /cvmfs/belle.kek.jp/grid/gbasf2/pro-py2/tools/setup.sh
. I haven't tested it but I assume this works as before, but I'm concerned this won't be available for a future release of gbasf2 and we'll be right back here.
I've removed the
--new
flag if the proxy group is notbelle
.
Thanks! However, it would be best to properly fix this in the future and have the same download method in both cases. After all, the --new
method might become default in the future and I don't know how the old one will be still supported. Also now we need to make sure that both download types work and correctly handled, e.g. a bugfix for downloads with --new
might break old-style downloads. I'm opened issue #200 for that, and am happy to now merge with this hotfix.
I merged now, sorry for being late with that. I also tagged this as release candidate v0.10.1-rc1, and you can see the release notes now on the github releases page as (as well as the changelog). But I didn't publish this on PyPi yet, meaning to install this you would need to use
pip3 install "git+https://github.com/nils-braun/b2luigi"
instead of just `pip3 install b2luigi. The reason is just that I expect that there might still be hotfixes for gbasf2 coming in in the next couple of days. And publishing a release to PyPi with every hotfix PR every couple of days is just a bit much work for me right now, so I'd rather wait a couple of days at least to see that we have something a bit stable.
If having the hotfixes on PyPi is really important I'd be happy if you could provide me feedback on that. Otherwise I would recommend that you tell users whom you know to install from github if they want to be bleeding-edge hotfixes for the latest gbasf2 release.
Thanks for the fix! It still works as of today. One minor thing missing: there is no explicit message displayed when the user should input the certificate password (you just have to input after the tasks are scheduled)
Thanks @0ctagon for reporting, that's a valid issue and I created #202 for that. Really I'm not under contract in any Belle II institution anymore so I hope somebody will look at that, should not be a too difficult of a fix.
Fixes the issues caused by
gbasf2 v5r7
.I've tested
gbasf2_ds_list.py
directly and the others throughb2luigi.batch.processes.gbasf2.get_gbasf2_project_job_status_dict
andb2luigi.batch.processes.gbasf2.get_proxy_info
.It would be good if you can confirm this works for you @Bilokin @sadlm8 @0ctagon.
Resolves #195