populse / capsul

Collaborative Analysis Platform : Simple, Unifying, Lean
Other
7 stars 14 forks source link

Capsul doc compilation hangs indefinitely on the 5.1 branch #358

Open ylep opened 4 months ago

ylep commented 4 months ago

When using bv maker doc, the process sometimes hangs indefinitely, seemingly while running some Ipython code:

$ ps -wwfH -u a-sac-ns-brainvisa
UID          PID    PPID  C STIME TTY          TIME CMD
a-sac-n+ 1047214 1047209  0 03:55 ?        00:00:00 /bin/sh -c CASA_BASE_DIRECTORY=/volatile/a-sac-ns-brainvisa/bbi_nightly TMPDIR=/volatile/tmp/a-sac-ns-brainvisa /volatile/a-sac-ns-brainvisa/bbi_nightly/cea-5.1-5.3/bin/casa_distro_admin bbi_daily jenkins_server='https://brainvisa.info/builds' install_thirdparty=file:///neurospin/brainvisa/thirdparty/thirdparty.json branch=5.1; CASA_BASE_DIRECTORY=/volatile/a-sac-ns-brainvisa/bbi_nightly TMPDIR=/volatile/tmp/a-sac-ns-brainvisa /volatile/a-sac-ns-brainvisa/bbi_nightly/cea-master-5.3/bin/casa_distro_admin bbi_daily jenkins_server='https://brainvisa.info/builds' install_thirdparty=file:///neurospin/brainvisa/thirdparty/thirdparty.json branch=master
a-sac-n+ 1047215 1047214  0 03:55 ?        00:00:00   python3 /volatile/a-sac-ns-brainvisa/bbi_nightly/cea-5.1-5.3/bin/casa_distro_admin bbi_daily jenkins_server=https://brainvisa.info/builds install_thirdparty=file:///neurospin/brainvisa/thirdparty/thirdparty.json branch=5.1
a-sac-n+ 1047219 1047215  0 03:55 ?        00:00:00     /usr/bin/python3 /volatile/a-sac-ns-brainvisa/bbi_nightly/cea-5.1-5.3/src/development/casa-distro/5.1/bin/casa_distro_admin bbi_daily jenkins_server=https://brainvisa.info/builds install_thirdparty=file:///neurospin/brainvisa/thirdparty/thirdparty.json branch=5.1
a-sac-n+ 1047310 1047219  0 03:55 ?        00:00:06       /usr/bin/python3 /volatile/a-sac-ns-brainvisa/bbi_nightly/cea-5.1-5.3/src/development/casa-distro/5.1/bin/casa_distro_admin bbi_daily jenkins_server=https://brainvisa.info/builds install_thirdparty=file:///neurospin/brainvisa/thirdparty/thirdparty.json branch=5.1 update_casa_distro=no
a-sac-n+ 1055157 1047310  0 04:06 ?        00:00:00         /usr/bin/python3 /volatile/a-sac-ns-brainvisa/bbi_nightly/cea-5.1-5.3/src/development/casa-distro/5.1/bin/casa_distro bv_maker name=cea-5.1-5.3 -- doc
a-sac-n+ 1055158 1055157  0 04:06 ?        00:00:00           Apptainer runtime parent
a-sac-n+ 1055176 1055158  0 04:06 ?        00:00:00             /bin/sh /.singularity.d/runscript bv_maker doc
a-sac-n+ 1055201 1055176  0 04:06 ?        00:00:00               python /casa/host/build/bin/bv_maker doc
a-sac-n+ 1055204 1055201  0 04:06 ?        00:00:00                 make -j12 -l12 doc
a-sac-n+ 1055206 1055204  0 04:06 ?        00:00:00                   make -s -f CMakeFiles/Makefile2 doc
a-sac-n+ 1055209 1055206  0 04:06 ?        00:00:00                     make -s -f CMakeFiles/Makefile2 CMakeFiles/doc.dir/all
a-sac-n+ 1055590 1055209  0 04:06 ?        00:00:00                       make -s -f build_files/capsul/CMakeFiles/capsul-sphinx.dir/build.make build_files/capsul/CMakeFiles/capsul-sphinx.dir/build
a-sac-n+ 1055596 1055590  0 04:06 ?        00:00:00                         /bin/sh -c cd /casa/host/build/build_files/capsul && ../../bin/bv_env /usr/bin/python3 -m sphinx /casa/host/src/capsul/5.1/doc/source /casa/host/build/share/doc/capsul-2.3
a-sac-n+ 1055598 1055596  0 04:06 ?        00:00:25                           /usr/bin/python3 -m sphinx /casa/host/src/capsul/5.1/doc/source /casa/host/build/share/doc/capsul-2.3
a-sac-n+ 1056816 1055598  0 04:06 ?        00:00:08                             /usr/bin/python3 -m ipykernel_launcher -f /tmp/tmpihcvgnvf.json --HistoryManager.hist_file=:memory:

Maybe the easy/lazy fix is just to disable documentation for Capsul v2, since we are abandoning this API anyway?

Environment:

ylep commented 4 months ago

The logs of that hanging Sphinx process are there: https://brainvisa.info/builds/view/branch%205.1/job/cea-5.1-5.3/4426/ (I killed the process manually after about 4 hours)

/casa/host/src/capsul/5.1/capsul/study_config/config_modules/somaworkflow_config.py:docstring of capsul.study_config.config_modules.somaworkflow_config.SomaWorkflowConfig.set_computing_resource_password:1: WARNING: duplicate object description of capsul.study_config.config_modules.somaworkflow_config.SomaWorkflowConfig.set_computing_resource_password, other instance in api/study_config, use :noindex: for one of them
/casa/host/src/capsul/5.1/doc/source/api/utils.rst:5: WARNING: No classes found for inheritance diagram

Notebook error:
DeadKernelError in tutorial/capsul_tutorial.ipynb:
Kernel died
make[3]: *** [build_files/capsul/CMakeFiles/capsul-sphinx.dir/build.make:71: build_files/capsul/CMakeFiles/capsul-sphinx] Error 2
make[2]: *** [CMakeFiles/Makefile2:11416: build_files/capsul/CMakeFiles/capsul-sphinx.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:6523: CMakeFiles/doc.dir/rule] Error 2
make: *** [Makefile:303: doc] Error 2
ylep commented 4 months ago

Also, the capsul tests timed out: https://brainvisa.info/builds/view/branch%205.1/job/cea-5.1-5.3/4430/

test_full_wf (capsul.pipeline.test.test_pipeline_workflow.TestPipelineWorkflow) ... ok
test_iter_workflow (capsul.pipeline.test.test_pipeline_workflow.TestPipelineWorkflow) ... 
================================================================================
TIMED OUT (exit code 124)
Finished: FAILURE
denisri commented 4 months ago

Yes I have noticed that docs fail or hang. The behaviour has changed after an update of the casa-distro images, and is linked with version changes in sphinx, nbsphinx, sphinx-gallery and possibly more related packages. However I could not find a version which reliably works now. With some versions notebooks docs fail or hang, and in some others it is sphinx-gallery which causes trouble (as far as I remember). The problem is perhaps on our side however: notebooks or examples run python programs which possibly don't exit cleanly as they import C++ bindings compiled modules. It' quite difficult to track. And it' even a bit worse than that: once a sphinx-gallery doc is built, it is not rebuilt later, so buildings docs may pass, or look being OK, but if we build them from scratch again, it fails.

Anyway disabling Capsul docs will not be enough: some fail in aims or anatomist docs.

The timeout in capsul test is also non-reproducible and happens sometimes. I guess it is somewhere in soma-workflow client/server communications.