libsdl-org / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
9.46k stars 1.76k forks source link

Why does testfilesystem timeout on the "Windows (clang32)" GitHub Action? #8214

Closed icculus closed 12 months ago

icculus commented 1 year ago

Lots of times commits and PRs are fine, but look like they have a build issue to resolve, because the "Windows (clang32)" build target is timing out in the testfilesystem test:

The following tests FAILED:
     11 - testfilesystem (Timeout)

This isn't a general Windows issue, as all the other Windows builders don't fail. It's only the clang32 target.

Let's figure out what's going wrong there.

icculus commented 1 year ago

(Also, sometimes rerunning the failed job will make the test succeed, so it doesn't timeout every time.)

madebr commented 1 year ago

I noticed this too here: https://github.com/libsdl-org/SDL/issues/8002

icculus commented 1 year ago

I'm taking a run at this to see if I can figure out what's going wrong.

icculus commented 1 year ago

This particular run passed, but throwing in some debug logging, it's pretty clear what made it upset was...

    pref_path = SDL_GetPrefPath("libsdl", "test_filesystem");

...it took over a second to return in this case. Everything else took 1 or less milliseconds.

2023-09-07T22:19:00.8134496Z 11: INFO: STOPWATCH: Step 'Startup' took 0 ticks
2023-09-07T22:19:00.8135282Z 11: INFO: STOPWATCH: Step 'SDLTest_CommonCreateState' took 0 ticks
2023-09-07T22:19:00.8135851Z 11: INFO: STOPWATCH: Step 'SDL_LogSetPriority' took 0 ticks
2023-09-07T22:19:00.8136666Z 11: INFO: STOPWATCH: Step 'SDLTest_CommonDefaultArgs' took 0 ticks
2023-09-07T22:19:00.8137187Z 11: INFO: STOPWATCH: Step 'SDL_Init' took 0 ticks
2023-09-07T22:19:00.8137878Z 11: INFO: STOPWATCH: Step 'SDL_GetBasePath' took 0 ticks
2023-09-07T22:19:00.8138447Z 11: INFO: base path: 'D:\a\SDL\SDL\build\test\'
2023-09-07T22:19:01.8733860Z 11: INFO: STOPWATCH: Step 'SDL_GetPrefPath("libsdl")' took 1059 ticks
2023-09-07T22:19:01.8736042Z 11: INFO: pref path: 'C:\Users\runneradmin\AppData\Roaming\libsdl\test_filesystem\'
2023-09-07T22:19:01.8736895Z 11: INFO: STOPWATCH: Step 'SDL_GetPrefPath(NULL)' took 1 ticks
2023-09-07T22:19:01.8737608Z 11: INFO: pref path: 'C:\Users\runneradmin\AppData\Roaming\test_filesystem\'
2023-09-07T22:19:02.0284509Z 11/19 Test #11: testfilesystem ...................   Passed    1.24 sec
icculus commented 1 year ago

It's the call to SHGetFolderPathW inside of SDL_GetPrefPath taking all the time. I'm not really sure how we would fix this in general.

So mingw 32-bit builds and clang 32-bit builds are both doing this...their 64-bit equivalents are not (that API call takes 1ms or less, afaict). MSVC builds aren't running the tests at all to compare, but it doesn't look Clang-specific or, I don't know, a calling convention issue in the headers or whatever.

I'm wondering if this function ends up pulling in an enormous amount of 32-bit system crud, since Shell32.dll is...the Windows Shell...and that stuff was already loaded in RAM for 64-bit processes on 64-bit build machines.

We could try putting a call to SHGetFolderPathW in something earlier and see if it makes testfilesystem magically fast on 32-bit targets.

Obviously it solves the problem within the same process, as testfilesystem calls it twice and the second call is basically immediate, but I suspect it'll solve it across the board if we force the OS to load that stuff before we start testing.

icculus commented 1 year ago

Yeah, I bet that's what it is. I stuck this test program into CMakeLists.txt, somewhere before testfilesystem was supposed to run...

#include <SDL3/SDL.h>
#include <SDL3/SDL_main.h>

int main(int argc, char *argv[])
{
    SDL_Init(0);
    SDL_GetPrefPath("libsdl", "test_filesystem");
    SDL_Quit();
    return 0;
}

...and it took seven and a half seconds!...

2023-09-07T23:49:26.7441408Z 5: Test command: D:\a\SDL\SDL\build\test\githubcache.exe
2023-09-07T23:49:26.7441911Z 5: Working Directory: D:/a/SDL/SDL/build/test
2023-09-07T23:49:26.7442395Z 5: Environment variables: 
2023-09-07T23:49:26.7442777Z 5:  SDL_AUDIO_DRIVER=dummy
2023-09-07T23:49:26.7443148Z 5:  SDL_VIDEO_DRIVER=dummy
2023-09-07T23:49:26.7443497Z 5:  PATH=D:/a/SDL/SDL/build
2023-09-07T23:49:26.7444109Z 5: Test timeout computed to be: 10
2023-09-07T23:49:32.7499406Z 5: INFO: STOPWATCH: Step 'SHGetFolderPathW' took 5984 ticks
2023-09-07T23:49:34.2188391Z  5/20 Test  #5: githubcache ......................   Passed    7.47 sec

...but then testfilesystem took 0.03 seconds later...

2023-09-07T23:49:34.3870912Z       Start 12: testfilesystem
2023-09-07T23:49:34.3871929Z 
2023-09-07T23:49:34.3872496Z 12: Test command: D:\a\SDL\SDL\build\test\testfilesystem.exe
2023-09-07T23:49:34.3872936Z 12: Working Directory: D:/a/SDL/SDL/build/test
2023-09-07T23:49:34.3873294Z 12: Environment variables: 
2023-09-07T23:49:34.3873625Z 12:  SDL_AUDIO_DRIVER=dummy
2023-09-07T23:49:34.3873970Z 12:  SDL_VIDEO_DRIVER=dummy
2023-09-07T23:49:34.3874287Z 12:  PATH=D:/a/SDL/SDL/build
2023-09-07T23:49:34.3874633Z 12: Test timeout computed to be: 10
2023-09-07T23:49:34.4088576Z 12: INFO: STOPWATCH: Step 'Startup' took 0 ticks
2023-09-07T23:49:34.4089311Z 12: INFO: STOPWATCH: Step 'SDLTest_CommonCreateState' took 0 ticks
2023-09-07T23:49:34.4089893Z 12: INFO: STOPWATCH: Step 'SDL_LogSetPriority' took 0 ticks
2023-09-07T23:49:34.4090442Z 12: INFO: STOPWATCH: Step 'SDLTest_CommonDefaultArgs' took 0 ticks
2023-09-07T23:49:34.4090951Z 12: INFO: STOPWATCH: Step 'SDL_Init' took 0 ticks
2023-09-07T23:49:34.4091423Z 12: INFO: STOPWATCH: Step 'SDL_GetBasePath' took 0 ticks
2023-09-07T23:49:34.4091907Z 12: INFO: base path: 'D:\a\SDL\SDL\build\test\'
2023-09-07T23:49:34.4130295Z 12: INFO: STOPWATCH: Step 'SHGetFolderPathW' took 4 ticks
2023-09-07T23:49:34.4132134Z 12: INFO: STOPWATCH: Step 'SDL_GetPrefPath("libsdl")' took 4 ticks
2023-09-07T23:49:34.4133707Z 12: INFO: pref path: 'C:\Users\runneradmin\AppData\Roaming\libsdl\test_filesystem\'
2023-09-07T23:49:34.4134313Z 12: INFO: STOPWATCH: Step 'SHGetFolderPathW' took 0 ticks
2023-09-07T23:49:34.4134906Z 12: INFO: STOPWATCH: Step 'SDL_GetPrefPath(NULL)' took 0 ticks
2023-09-07T23:49:34.4135491Z 12: INFO: pref path: 'C:\Users\runneradmin\AppData\Roaming\test_filesystem\'
2023-09-07T23:49:34.4160598Z 12/20 Test #12: testfilesystem ...................   Passed    0.03 sec

So ideally we find a better way to coerce a 32-bit program to call this function right at startup, but only on 32-bit Windows builds that also run tests, preferably in parallel to everything else, so everything is ready by the time the tests arrive and we don't waste time and resources loading stuff when it won't be needed.

I wonder if there's some 32-bit .exe on the 64-bit Windows install that we can coerce to do this for us, right in the .yml file...

ericoporto commented 12 months ago

MSVC builds aren't running the tests at all to compare,

Wait, why MSVC isn't being tested?

madebr commented 12 months ago

Wait, why MSVC isn't being tested?

Since bd49696133eded6f441b20391b5bb4376b3834d8, we're building SDL on MSVC as a subproject. The new top project does not do enable_testing(), causing CTest to not find any tests.

https://github.com/libsdl-org/SDL/pull/7466 is fixing this.

madebr commented 12 months ago

@icculus x86 clang-cl testaudioinfo also times out. See https://github.com/libsdl-org/SDL/actions/runs/6291006715/job/17078961220#step:6:29

testfilesystem_pre took 33s(!) to finish testaudioinfo timed out, while testfilesystem_pre was running in parallel.

icculus commented 12 months ago

Should we just set the timeout for everything on 32-bit Windows to five minutes? So when we get hit by this on future things, they'll just work, and if something is in a legitimate infinite loop, we'll eventually find out anyhow?

madebr commented 12 months ago

There's something else going on, I think. See this output:

13: INFO: SDL_GetPrefPath took 15270ms
 2/21 Test #13: testfilesystem_pre ...............   Passed   33.63 sec

That means SDL_Quit() took ~18s to finish.

icculus commented 12 months ago

It could just be the virtual machine's physical hardware is heavily overloaded, depending on what else it might be building across our project and the rest of GitHub.

We could pay for dedicated builders but otherwise we might just be at the mercy of system load beyond our control, which is exasperated by 32-bit system libraries not being resident at start.

We could make everything depend on a testfilesystem_pre thing that just initializes everything, calls a few things and quits. But I think an obscenely large timeout is probably the better solution at this point.

madebr commented 12 months ago

We could make everything depend on a testfilesystem_pre thing that just initializes everything, calls a few things and quits. But I think an obscenely large timeout is probably the better solution at this point.

I tried that suggestion and the tests succeed (for now).

slouken commented 12 months ago

Does that mean we can remove testfilesystem_pre?

madebr commented 12 months ago

Does that mean we can remove testfilesystem_pre?

It lives on in pretest :). All tests depend on pretest.

The tests are run in parallel (-j2), so we still have a time gain there.

madebr commented 12 months ago

Closing this again. https://github.com/libsdl-org/SDL/pull/7466 makes every test depend on pretest (on 32-bit Windows)

madebr commented 11 months ago

I (now) know you can't do SDL_GetTicks() after SDL_Quit() https://github.com/libsdl-org/SDL/blob/9667aa18e6c29885d8cd45efc2cf8bcb97270d7d/test/pretest.c#L33-L36 but...

I just got this ci output:

13: Test timeout computed to be: 60
13: INFO: SDL_GetPrefPath took 12580ms
13: INFO: SDL_Quit took 18446744073709539036ms
 1/21 Test #13: pretest ..........................***Timeout  60.67 sec

So is there another delay after main/SDL_main finishes? SDL_RunApp only does HeapFree after the main function.