Open lgoettgens opened 1 month ago
The comment could in principle refer to any of the 150 packages shipped with GAP and being loaded. But they all should have their manual as a PDF bundled, except for OscarInterface and JuliaInterface.
This message shows up when something calls ValidatePackageInfo
on a package's metadata. I'll submit a PR to avoid this warning for OscarInterface
But I wonder: why is it being shown at all? Why / what is validating GAP package metadata here? And is it really OscarInterface it complains about, or something else? Perhaps @ThomasBreuer has an idea?
maybe doing something similar to https://github.com/oscar-system/Oscar.jl/pull/3968 for JuliaInterface (and JuliaExperimental) gets rid of it? (But that of course needs https://github.com/oscar-system/Oscar.jl/pull/3688 to be merged before we can see any changes here in CI)
Code in GAP's PackageManager calls ValidatePackageInfo
in several places, I guess this is the source of the output (without having verified this).
One of the problems is that ValidatePackageInfo
calls Print
if it finds something to complain about; this function should have a quiet
mode, and PackageManager
should pass its own quiet
flag on to ValidatePackageInfo
.
(I do not understand why we did not get these messages earlier.)
After some digging around I now have the following additional information:
macOS-latest
nor macOS-13
or macOS-12
, so the issue seems to be independent of the host architecture.So @ThomasBreuer and I think that this is somehow due to the way that the RPTU runner re-uses things from previous runs.
@aaruni96 will look into properly separating and cleaning up the Julia Homes on the RPTU runner when he is back in KL. Let's hope that this then resolves this issue
@lgoettgens @ThomasBreuer so which things are re-used? Is this about a git clone of a dev version of recog sticking around?
If so it could probably be solved by doing rm -rf ~/.gap ~/.julia/gaproot
at the start of our CI workflows (at least when they are running on the self-hosted runners).
Or is there anything else you suspect might be relevant?
This unfortunately won't work, as it will introduce race conditions between the different runners on the self-hosted machine. But combining this together with separating the JULIA_HOMEs of the different runners is exactly what @aaruni96 wanted to try
@aaruni96 already separated them now
I just checked the latest CI run on master (which was 3 days ago), and this doesn't have these prints anymore. See https://github.com/oscar-system/Oscar.jl/actions/runs/10130857450/job/28149030285#step:9:352. Thus I think this issue can be closed
which was 3 days ago
I made the change last evening, and re ran just the short test, to make sure nothing is broken as a side effect.
But the long test appears to have failed "one hour ago" with a similar (but not the same) error ( https://github.com/oscar-system/Oscar.jl/actions/runs/10130857450/job/28149030653#step:9:240 ). You can see that its using artefacts from the runner specific depot
Error, file "/Users/aaruni/Desktop/oscar-runners/runner-4/julia-depot/scratchspaces/c863536\
a-3901-11e9-33e7-d5cd0df7b904/gap_14920021487703909718_1.10/pkg/gapdoc/lib/GAP\
Doc2HTML.gi" must exist and be readable
This might be related to the same problem ?
Note, that we have just done a separation of the depots, there is still no "cleanup" phase which removes stuff for a fresh start.
Github runners do allow Pre / Post hook scripts ( https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/running-scripts-before-or-after-a-job ), so, a decision needs to be made about what we should do in the cleanup phase, and then I can work on making it happen.
I am surprised to see this error about GAPDoc2HTML.gi
missing -- the suspected cause for this (race condition) should be eliminated now that the runner have separate depots. I just had a look at the machine and the file is there now...
But this suggests that in addition to the two directories I already suggested, perhaps we should also rm -rf JULIADEPOT/scratchspaces
?
I think we should just clear the complete JULIA_DEPOT and instead use the julia-actions/cache action again. This should cache exactly those parts that are safe to keep between runs.
I disagree. That wastes many dozens of gigabytes of bandwidth every day.
I disagree. That wastes many dozens of gigabytes of bandwidth every day.
True...
So removing .gap
, .julia/gaproot
, and .julia/scratchspaces
(or whatever they are called in the multi-depot setup) in a job hook seems to be a good compromise
Seen on a recent master, e.g. in https://github.com/oscar-system/Oscar.jl/actions/runs/10066028282/job/27826455858#step:9:239. It seems to only occur on the macOS runners.
The only reference to
PDFFile
I found in this repo is https://github.com/oscar-system/Oscar.jl/blob/8163ba8856c7c5d7643c4f7c78a082eabbe09575/gap/OscarInterface/PackageInfo.g#L71 so I suspect the GAP interface to be somehow the culprit of this. Maybe @fingolfin or @ThomasBreuer have some insight into this?More breadcrumbs: