Closed meliache closed 11 months ago
After running both examples you provided, I found out that in gbasf2_job_status.py
:
from BelleDIRAC.Client.helpers.auth import userCreds
doesn't exist anymore and was moved to:
from BelleDIRAC.gbasf2.lib.auth import userCreds
.
After doing this change in gbasf2_job_status.py
(and gbasf2_df_list.py
), I get a new error:
In [4]: print(get_gbasf2_project_job_status_dict("testproject"))
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-4-a714f0056404> in <module>
----> 1 print(get_gbasf2_project_job_status_dict("testproject"))
~/.local/lib/python3.8/site-packages/decorator.py in fun(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
~/.local/lib/python3.8/site-packages/retry/api.py in retry_decorator(f, *fargs, **fkwargs)
88 args = fargs if fargs else list()
89 kwargs = fkwargs if fkwargs else dict()
---> 90 return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
91 logger, log_traceback, on_exception)
92
~/.local/lib/python3.8/site-packages/retry/api.py in __retry_internal(f, exceptions, tries, delay, max_delay, backoff, jitter, logger, log_traceback, on_exception)
33 while _tries:
34 try:
---> 35 return f()
36 except exceptions as e:
37 if on_exception is not None:
~/.local/lib/python3.8/site-packages/b2luigi/batch/processes/gbasf2.py in get_gbasf2_project_job_status_dict(gbasf2_project_name, dirac_user, gbasf2_setup_path)
1107 )
1108 job_status_json_string = proc.stdout
-> 1109 return json.loads(job_status_json_string)
1110
1111
/cvmfs/belle.cern.ch/el7/externals/v01-12-01/Linux_x86_64/common/lib/python3.8/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
355 parse_int is None and parse_float is None and
356 parse_constant is None and object_pairs_hook is None and not kw):
--> 357 return _default_decoder.decode(s)
358 if cls is None:
359 cls = JSONDecoder
/cvmfs/belle.cern.ch/el7/externals/v01-12-01/Linux_x86_64/common/lib/python3.8/json/decoder.py in decode(self, s, _w)
338 end = _w(s, end).end()
339 if end != len(s):
--> 340 raise JSONDecodeError("Extra data", s, end)
341 return obj
342
JSONDecodeError: Extra data: line 1 column 5 (char 4)
If I print the job_status_json_string
that get_gbasf2_project_job_status_dict()
loads at the end of the script, I get:
2023-11-03 01:00:57 UTC Framework ERROR: ERROR: proxy has not Belle VOMS extensions
Thanks for figuring this out. Maybe setting up the gbasf2 proxy fails or there is an issue with the Belle II environment? With your fix, does the script gbasf2_job_status.py
return proper JSON when you run it from a terminal with a gbasf2 environment and an active proxy? Just wondering if the remaining error is within the gbasf2_job_status.py
script or somewhere else, e.g. in the functions get_gbasf2_env
or setup_dirac_proxy
which set up the environment with which gbasf2_job_status.py
is executed.
BTW, I'm really annoyed that we don't get any errors when calling gbasf2_job_status.py
, imo when the script fails and returns something that is not json, than b2luigi should raise an exception earlier and with a better message, and not just return a random string as an output. But this error handling is not just our fault. An error message should usually always be sent to stderr and not stdout, but I'm getting off track here...
Also if you have some fixes, feel free to create a PR early, you could mark it as DRAFT.
Resolved by #209
Originally reported by @0ctagon in https://github.com/nils-braun/b2luigi/issues/206#issuecomment-1790328890, who saw the following error message
Traceback
``` INFO: Worker Worker(salt=8402490694, workers=1, host=cc.kek.jp, username=a, pid=255855) was stopped. Shutting down Keep-Alive thread Traceback (most recent call last): File "b2luigi_gridSubmitDL.py", line 128, inI cannot really test and thus fix this as I don't have a grid certificate anymore and am not employed by a Belle II institution any longer, so I need help here. But I can give some debugging hints, what I understand from the error message.
So the error happens in the function
get_gbasf2_project_job_status_dict
. I recommend debugging this function by calling it in an interactive IPython session, e.g.This requires creating and submitting a gbasf2 project first, which I cannot do anymore.
Internally, this function calls the script
b2luigi/batch/processes/gbasf2_utils/gbasf2_job_status.py
as a subprocess. That script is supposed to return all job statuses in a project in JSON format. Maybe that script stopped working. So I would test running that script directly from the commandline on an existing gbas2 project viaIf that file is buggy, than we need help of somebody with some gbasf2 code knowledge to fix it.
Or maybe
gb2_job_status.py
already introduced a--json
flag or something like that to make it machine-readable? I didn't follow the latest release notes but I remember that had been a request once. If such a flag exists, we could replace our custom script with that.