radiasoft / sirepo

Sirepo is a framework for scientific cloud computing. Try it out!
https://sirepo.com
Apache License 2.0
64 stars 32 forks source link

fedora36: get all tests passing #4989

Closed e-carlin closed 1 year ago

e-carlin commented 2 years ago

Use this issue to collect all problems that are preventing tests running on fedora 36.

e-carlin commented 2 years ago

https://github.com/radiasoft/sirepo/issues/4993

e-carlin commented 2 years ago

adm_own_jobs_test

Oct 22 12:06:09 25765     0 ../../sirepo/job_driver/__init__.py:296:_receive LocalDriver(a=xZ5W k=parallel u=r7L0 [_Op(run, zNjX)]) error msg={'agentId': 'xZ5WTJsGxZu\
WDSl5KD8aAirc0BIiOZRe', 'opId': 'zNjXjJMvANxPCZQOCf9ZKlknT1MJDoZ8', 'opName': 'error', 'reply': {'error': 'unable to parse job_cmd output', 'state': 'error', 'stdout'\
: 'b'UtiMathEigen WARNING: SciPy unavailable; to install: pip install scipy
''}}

Update: fixed by https://github.com/radiasoft/download/pull/356

e-carlin commented 2 years ago
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/pkcli/job_cmd.py", line 221, in _do_get_simulation_frame
    return template_common.sim_frame_dispatch(
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/template/template_common.py", line 648, in sim_frame_dispatch
    res = o(frame_args)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/template/warppba.py", line 246, in sim_frame_particleAnimation
    return extract_particle_report(frame_args, "electrons")
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/template/warppba.py", line 96, in extract_particle_report
    opmd = _opmd_time_series(data_file)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/template/warppba.py", line 295, in _opmd_time_series
    prev = main.list_h5_files
AttributeError: module 'openpmd_viewer.openpmd_timeseries.main' has no attribute 'list_h5_files'

It looks like quite a few things in openpmd_viewer were moved around.

robnagler commented 2 years ago

Make a separate issue. I seem to remember them moving things around before.

e-carlin commented 2 years ago

radia

Oct 23 02:32:40 114274     0 ../../../../../../../../sirepo/mpi.py:62:restrict_op_to_first_rank op=<function <lambda> at 0x7fc9ebb8feb0> exception=PY_SSIZE_T_CLEAN macro must be defined for '#' formats stack=Traceback (most recent call last):
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/mpi.py", line 60, in restrict_op_to_first_rank
    res = op()
  File "/home/vagrant/src/radiasoft/sirepo/tests/animation_work/db/user/nFIo7VvZ/radia/k5IpJxhL/solverAnimation/parameters.py", line 166, in <lambda>
    sirepo.mpi.restrict_op_to_first_rank(lambda: _write_dmp(g_id, 'geometry.dat'))
  File "/home/vagrant/src/radiasoft/sirepo/tests/animation_work/db/user/nFIo7VvZ/radia/k5IpJxhL/solverAnimation/parameters.py", line 154, in _write_dmp
    f.write(radia_util.dump_bin(g_id))
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/template/radia_util.py", line 338, in dump_bin
    return radia.UtiDmp(g_id, "bin")
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Exception was printed at:

SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
  File "/home/vagrant/src/radiasoft/sirepo/tests/animation_work/db/user/nFIo7VvZ/radia/k5IpJxhL/solverAnimation/parameters.py", line 166, in <module>
    sirepo.mpi.restrict_op_to_first_rank(lambda: _write_dmp(g_id, 'geometry.dat'))
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

Fix https://github.com/radiasoft/download/pull/363

e-carlin commented 2 years ago

zgoubi

Traceback (most recent call last):
  File "/home/vagrant/.pyenv/versions/py3/bin/sirepo", line 33, in <module>
    sys.exit(load_entry_point('sirepo', 'console_scripts', 'sirepo')())
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/sirepo_console.py", line 18, in main
    return pkcli.main("sirepo")
  File "/home/vagrant/src/radiasoft/pykern/pykern/pkcli/__init__.py", line 157, in main
    res = argh.dispatch(parser, argv=argv)
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/pkcli/zgoubi.py", line 52, in run_background
    _bunch_match_twiss(cfg_dir, data)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/pkcli/zgoubi.py", line 87, in _bunch_match_twiss
    _run_zgoubi(cfg_dir, python_file="twiss.py")
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/pkcli/zgoubi.py", line 163, in _run_zgoubi
    template_common.exec_parameters(python_file)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/template/template_common.py", line 358, in exec_parameters
    return pkrunpy.run_path_as_module(path or PARAMETERS_PYTHON_FILE)
  File "/home/vagrant/src/radiasoft/pykern/pykern/pkrunpy.py", line 28, in run_path_as_module
    exec(code, m.__dict__)
  File "twiss.py", line 4, in <module>
    from zgoubi import core, utils
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/zgoubi/core.py", line 70, in <module>
    sys.setcheckinterval(10000)
AttributeError: module 'sys' has no attribute 'setcheckinterval'. Did you mean: 'setswitchinterval'?

fix https://github.com/radiasoft/download/pull/365

e-carlin commented 2 years ago

uwsgi

WSGI app 0 (mountpoint='') ready in 2 seconds on interpreter 0x1213c90 pid: 102842 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 102842)
spawned uWSGI worker 1 (pid: 102848, cores: 10)
*** Stats server enabled on /home/vagrant/src/radiasoft/sirepo/tests/nginx_uwsgi_work/db/uwsgi.sock fd: 9 ***
Traceback (most recent call last):
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/flask/app.py", line 2091, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/flask/app.py", line 2076, in wsgi_app
    response = self.handle_exception(e)
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/uri_router.py", line 309, in _dispatch
    return _call_api(None, route, kwargs=kwargs)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/uri_router.py", line 257, in _call_api
    sirepo.auth.init_quest(qcall)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/auth/__init__.py", line 100, in init_quest
    o._set_log_user()
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/auth/__init__.py", line 727, in _set_log_user
    sirepo.flask.set_log_user(_user)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/flask.py", line 70, in set_log_user
    a.sirepo_uwsgi.set_logvar(_UWSGI_LOG_KEY_USER, user_op())
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Fix: On oct 23 there was a release that (thankfully) fixed the issue https://pypi.org/project/uWSGI/#history (2.0.21). I think just rebuilding the Sirepo image should fix it. pip install --upgrade uwsgi worked on my vm

e-carlin commented 2 years ago

shadow

Traceback (most recent call last):
  File "/home/vagrant/.pyenv/versions/py3/bin/sirepo", line 33, in <module>
    sys.exit(load_entry_point('sirepo', 'console_scripts', 'sirepo')())
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/sirepo_console.py", line 18, in main
    return pkcli.main("sirepo")
  File "/home/vagrant/src/radiasoft/pykern/pykern/pkcli/__init__.py", line 157, in main
    res = argh.dispatch(parser, argv=argv)
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/pkcli/shadow.py", line 56, in run
    res = _run_beam_statistics(cfg_dir, data)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/pkcli/shadow.py", line 88, in _run_beam_statistics
    template_common.exec_parameters()
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/template/template_common.py", line 358, in exec_parameters
    return pkrunpy.run_path_as_module(path or PARAMETERS_PYTHON_FILE)
  File "/home/vagrant/src/radiasoft/pykern/pykern/pkrunpy.py", line 28, in run_path_as_module
    exec(code, m.__dict__)
  File "parameters.py", line 2, in <module>
    from Shadow.ShadowPreprocessorsXraylib import prerefl, pre_mlayer, bragg
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/Shadow/__init__.py", line 8, in <module>
    from Shadow.ShadowLibExtensions import OE, Source, Beam, CompoundOE, IdealLensOE
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/Shadow/ShadowLibExtensions.py", line 11, in <module>
    import Shadow.ShadowLib as ShadowLib
ImportError: /home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/Shadow/ShadowLib.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZGVbN2v_exp
e-carlin commented 2 years ago

ml

ImportError while importing test module '/home/vagrant/src/radiasoft/sirepo/tests/template/ml_load_model_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../.pyenv/versions/3.10.5/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
../sirepo/tests/template/ml_load_model_test.py:7: in <module>
    import keras.models
E   ModuleNotFoundError: No module named 'keras'

Fix https://github.com/radiasoft/download/pull/369

e-carlin commented 2 years ago

synergia:

_____________________________________________________________________________________________________ test_generate_python _____________________________________________________________________________________________________
Traceback (most recent call last):
  File "/home/vagrant/src/radiasoft/sirepo/tests/template/synergia_generate_test.py", line 14, in test_generate_python
    from sirepo.template import synergia
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/template/synergia.py", line 17, in <module>
    from synergia import foundation
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/synergia/__init__.py", line 8, in <module>
    from . import utils
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/synergia/utils/__init__.py", line 1, in <module>
    from .utils import *
ImportError: libkokkoscontainers.so.3.7: cannot open shared object file: No such file or directory

Fix: https://github.com/radiasoft/download/pull/373

e-carlin commented 2 years ago

Now seeing this with synergia

Nov 03 14:47:09 118880     0 ../../sirepo/job_driver/__init__.py:296:_receive LocalDriver(a=fPe3 k=sequential u=PZDy [_Op(run, VoxO)]) error msg={'agentId': 'fPe39NLqFI2IkmSLI8fu87N0ntXuHcu2', 'opId': 'VoxO1nqrDTs3zXW4vnlW2PF\
   OQp50jFJz', 'opName': 'error', 'reply': {'error': 'unable to parse job_cmd output', 'state': 'error', 'stdout': 'b'Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
649''}}

Looks like kokkos prints warnings to stdout which messes with job_cmd

robnagler commented 2 years ago

Which test or is this manual?

e-carlin commented 2 years ago

All tests that import synergia (tests/report_test.py, tests/server_test.py, tests/animation_test.py). It happens anytime synergia is imported

$ python -c 'import synergia'
Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
  In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
  For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
  For unit testing set OMP_PROC_BIND=false

It listens for an environment variable to disable (KOKKOS DISABLE WARNINGS=1). But, I think I'll just patch to default warnings to off (perl -pi -e 's/bool disable_warnings/bool disable_warnings = true/' $(find ./synergia/utils/kokkos -name '*.cpp')

robnagler commented 2 years ago

I see now that job_cmd has to import templates directly. There's no way around that except another subprocess which seems wrong, and redirecting stdout is not a good idea in this situation. Fixing kokkos is the right thing.

However, I think this tells us that choosing stdout for job_cmd is a bit problematic. We could have used another file handle that was opened just for this purpose. Many libraries output to stderr, it is a bit of a surprise kokkos outputs to stdout. Just noting...

e-carlin commented 2 years ago

Agreed. This is the second time I've had to make a change to get around this problem...

e-carlin commented 2 years ago

Now problems running synergia. This is the run.log for synergia animation test:

Traceback (most recent call last):
  File "/home/vagrant/.pyenv/versions/py3/bin/sirepo", line 33, in <module>
    sys.exit(load_entry_point('sirepo', 'console_scripts', 'sirepo')())
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/sirepo_console.py", line 18, in main
    return pkcli.main("sirepo")
  File "/home/vagrant/src/radiasoft/pykern/pykern/pkcli/__init__.py", line 157, in main
    res = argh.dispatch(parser, argv=argv)
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/home/vagrant/.pyenv/versions/3.10.5/envs/py3/lib/python3.10/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/pkcli/synergia.py", line 24, in run
    template_common.exec_parameters()
  File "/home/vagrant/src/radiasoft/sirepo/sirepo/template/template_common.py", line 358, in exec_parameters
    return pkrunpy.run_path_as_module(path or PARAMETERS_PYTHON_FILE)
  File "/home/vagrant/src/radiasoft/pykern/pykern/pkrunpy.py", line 28, in run_path_as_module
    exec(code, m.__dict__)
  File "parameters.py", line 32, in <module>
    stepper = synergia.simulation.Independent_stepper_elements(
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. synergia.simulation.simulation.Independent_stepper_elements(steps_per_element: int = 1)

Invoked with: fodo:
 quadrupole f: k1=0.07142857142000000326, l=2, yoshida_order=2, propagator_type=yoshida
 drift o: l=8
 quadrupole d: k1=-0.07142857142000000326, l=2, yoshida_order=2, propagator_type=yoshida
 drift o: l=8
, 1, 5

I don't know enough about cpp but it looks like they removed the constructors in https://github.com/fnalacceleratormodeling/synergia2/commit/62ec3b3a7403af021120469a7d65809cdd2e4bc0#diff-18bc5525ae66fbe82b4dd3de6723a8051bac62ba7da5733f91524240e7ee61c0L18

I'm going to pin us to the commit that was working on f32

e-carlin commented 1 year ago

After discussion we decided not to pin. Better to adapt our code to fit the new style in synergia. There are going to be enough changes that I think the changes should be tracked on their own https://github.com/radiasoft/sirepo/issues/5170