madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 33 forks source link

Observations with the CI scrips #936

Open roiser opened 2 months ago

roiser commented 2 months ago

I'm trying to run the CI scripts from the command line, I will use this issue report for some observations that I found on the way. Please bear in mind, this is meant to run in the github CI where it also runs successfully, but running the same scripts from the command line is sometimes useful e.g. for debugging CI failures.

First observation, when running e.g. on itscrd-a100 I have the following

echo $PATH
/opt/rh/gcc-toolset-13/root/usr/bin:/usr/sue/bin:/usr/share/Modules/bin:/usr/local/cuda-12.4/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin

when then running the testsuite_oneprocess.sh script through the different stages (codegen, before_build, build, ...) it fails for me in the tput_test stage because the cuda build is not found (which is true), the error being

Execute build.cuda_d_inl0_hrd0/runTest.exe
(SKIP missing build.cuda_d_inl0_hrd0/runTest.exe)
ERROR! Neither build.cuda_d_inl0_hrd0/check.exe nor build.cuda_d_inl0_hrd0/gcheck.exe was found?

when removing though the cuda/nvcc location from the PATH it will run through correctly (ignoring of course cuda). I would suggest to check for this case.

roiser commented 2 months ago

Another observation I made is that when running the "tmadtest" multiple times (again from the command line), the input for the madevent* executable is concatenated n times. It doesn't seem to hurt, though its also not correct. E.g. check files like input_gg_tt_none etc.

roiser commented 2 months ago

Sorry, one last one, but I keep them separated here so we can discuss those individually. Again something I think I haven't spotted before. For the "tmad_test" one needs to set the FPTYPE=d|f|m for running the test while for the other stages not. Let me suggest to have this done in a uniform way, or to check if the variable is set at this stage and report if not.

valassi commented 2 months ago

The "CI" scripts are meant for the CI.

There are (since 3 years) other "manual" scripts that I prepared - but no one else than me so far has wanted to look at them.

Are you running your tests in a CI or locally/manually? As far as I know we have no A100 CI (do we?). So I guess you are running the 'wrong' scripts.

roiser commented 2 months ago

I'm running the scripts from the command line. Yes everything is working fine in the CI. Just for understanding CI failures this was a very convenient way for me to debug those in the past. No worries if there are no fixes for those things above, they are understood now.

valassi commented 2 months ago

I'm running the scripts from the command line. Yes everything is working fine in the CI. Just for understanding CI failures this was a very convenient way for me to debug those in the past. No worries if there are no fixes for those things above, they are understood now.

Can this be closed then?

roiser commented 2 months ago

Can this be closed then?

No leave it open please, just for my reference. thanks