shader-slang / slang

Making it easier to work with shaders
MIT License
1.78k stars 159 forks source link

Running multi-threaded vkcts in slang mode produces errors #3739

Open dzysk opened 3 months ago

dzysk commented 3 months ago

When I tried running vkcts multi-threaded in slang mode I see random failures in tests that pass when run serially. When I disable slang and run vkcts multi-threaded the errors do not occur.

The errors are all similar to: "ERROR: Got non-white pixels on sub-case 1"

The command I run is (for each thread I have a different test list and I increase the log file name and cache file name by one): deqp-vk.exe --deqp-archive-dir=. --deqp-caselist-file=parallel1.txt --deqp-log-images=disable --deqp-log-shader-sources=disable --deqp-log-flush=disable --deqp-log-filename=TestResults1.qpa --deqp-shadercache-filename=shadercache1.bin

pmistryNV commented 3 months ago

This will take around 3 weeks of work.

  1. There are few global static variables that needs to be added to a context so that it becomes thread safe.
  2. Make sure the test_sever is able to process multiple queries simultaneously. It might need a different kind of initialization. @csyonghe can the test_sever handle multiple queries?
  3. We will have to handle test_sever crash across multiple threads
  4. Test the compelete list and make sure there are no issues. This takes the longest time as total time to test is high.

Approximately 3 weeks of effort including testing.

csyonghe commented 3 months ago

Instead of having test server handle multiple requests in parallel, the right thing to do is to launch multiple test servers and dispatch requests to them independently. I don't think it will take three weeks if we go this route. The slang-test infrastructure used this technique to run tests in parallel without getting into the mess of thread-proofing the rest of the code.

csyonghe commented 3 months ago

This can be done by having each CTS test thread create its own test server and handle the restart of test server independently in that thread. There should be no thread syncing required.

dzysk commented 2 months ago

parallel1.txt parallel2.txt parallel3.txt parallel4.txt parallel5.txt attaching test lists that I used just for reference. I'm not sure if this is the best way to run in parallel. It's just the way I set it up.

pmistryNV commented 2 months ago

CTS has an option --deqp-fraction= that can run parallel threads. I am investigating it.

swoods-nv commented 2 months ago

It sounds like this may be resolved; Pankaj to confirm when he is back.

pmistryNV commented 2 months ago

The issue happens when when multiple processes are launched simultaneously. Seems like a issue with windows mechanism of registering a mutex. The issue can be wok-around by launching each process after some delay.