Open pcanal opened 5 months ago
My two cents is we should disable the tests (and stop using Jenkins for PR builds)
My two cents is we should disable the tests
Do you mean short term while we fix the issue or do you mean just ignore the issue?
In this case, unless the tutorial (!) are showing something that the user are very unlikely to do, I think we need to go further and make sure that this same crash/problem won't happen to the user.
(and stop using Jenkins for PR builds)
I agree ... except that they seems to be probing an area of the phase space we are not testing elsewhere.
Note: if the issue is as simple as "the installed version of some dependent product is too old so there is no point in fixing the problem", we still need to fix the CMake
configuration to fail when asked to use those older version.
Do you mean short term while we fix the issue or do you mean just ignore the issue? In this case, unless the tutorial (!) are showing something that the user are very unlikely to do, I think we need to go further and make sure that this same crash/problem won't happen to the user.
If the feature does not work, and it is advertised as a public tutorial, then it should be either fixed soon (i.e. before next release imho) or we need to back down on the feature and inject some sort of warning mechanism that the feature is not really working. Same as we had to recently do for https://github.com/root-project/root/issues/15197 via https://github.com/root-project/root/pull/15198.
We cannot provide tutorials that just do not work, so I don't see a third way
This is a again the same problem which appeared before, see https://github.com/root-project/root/pull/14716 The conflict between std::regexp and pytorch jit. Either we can avoid using in pytorch or we somehow try to live with this issue hoping at some point it will be fixed in pytorch. We don't see in the new CI because pytorch is not installed in the ubuntu nodes. I will add it before retiring the old CI
Where do we still use std::regexp
?
In cppyy, and we know for sure that that usage is problematic because we see the same problem with upstream cppyy: https://github.com/wlav/cppyy/issues/227 https://github.com/root-project/root/blob/master/bindings/pyroot/cppyy/CPyCppyy/src/Converters.cxx#L61
So it's not only pytorch, but also xgboost. For the xgboost case, I could fix the crash by importing xgboost before ROOT. Maybe for pytorch that works too?
By curiosity, did we notice the clash during the CPPYY
merge/update process? If not in which other step did we see it (and/or miss seeing it when we should). I.e. I am asking in the context of improving our processes.
It didn't hit me when working on the cppyy upgrade (#14507), all tests on Jenkins and the new CI were green there.
I only noticed it a few days ago when enabling the xgboost tests to test my new development here: https://github.com/root-project/root/pull/15173
Why these TMVA tutorial failures come only now several weeks after the cppyy upgrade? I don't know. I'm just catching up with the recent discussions on Python requirements
Why these TMVA tutorial failures come only now several weeks after the cppyy upgrade?
This is a problem in itself :(. From my record (it was nice to received failure report by emails, it seems we don't get that from the new CI), that this is the first failure:
https://lcgapp-services.cern.ch/root-jenkins/job/root-pullrequests-handler/30970/console
I.e on the nightly of April 3rd, 2024 (9 days ago).
But rather than time it might also be -DCTEST_TEST_EXCLUDE_NONE=On
that uncovered it.
And indeed the Ubuntu node are build by default without the -DCTEST_TEST_EXCLUDE_NONE=On
So my understanding is that we need to:
CPPYY
's use of std::regexp or TMVA's use of PyTorch (This could be removing support for PyTorch or temporarily disabling the tutorials until the (newly created and/or this) Issue noting the conflict CPPYY/PyTorch is resolved (for example if the early load mentioned early works).Am I missing something?
I tried reducing the number of symbols that libcppyy.so
exports using -fvisibility=hidden
, combined with explicit exports of the public API functions. That works fine (is, after all, already done this way for Windows), but doesn't resolve the issue: the regex
symbols remain. I found that the standard C++ headers force __attribute__((__visibility__("default")))
, so there's no way to control this.
If I replace lib = ctypes.cdll.LoadLibrary(lib_path)
in xgboost/core.py
with lib = ctypes.CDLL(lib_path, 8)
, then that solves the issue.
Check duplicate issues.
Description
As seen on https://github.com/root-project/roottest/pull/1106#issuecomment-2050216040 and https://github.com/root-project/roottest/pull/1106#issuecomment-2050220592 there are several tutorials failures in the TMVA tutorials.
We do not see those failures in the new CI thus it could be that either the new CI is not running them (or ignoring the failures) or there is a missing protection against a bad/not-supported configuration.
This is a blocker to completely retiring the jenkins build as they currently "appear" to show failure that the new CI does not detect.
Reproducer
A sample of the failure output in case the log are deleted can be found at https://gist.github.com/pcanal/d68be81d3ea618a83fb50a11d4ea1d1d
ROOT version
master
Installation method
Jenkins/CI
Operating system
Ubuntu 22.04
Additional context
Seen on ROOT-ubuntu2004/python3. Running on root-ubuntu-2004-1.cern.ch and ROOT-ubuntu2204/nortcxxmod.Running on root-ubuntu-2204-2.cern.ch