vrurg / raku-Test-Async

Asynchronous, thread-sage testing and more!
Artistic License 2.0
4 stars 2 forks source link

Testing by zef installation is taking too long on two of my Debian hosts #2

Closed tbrowder closed 10 months ago

tbrowder commented 11 months ago

I then gave up waiting and aborted the attempt

I tried "zef install --force-install Test::Async" and got the same behavior, and gave up and aborted the process.

Both my hosts are reasonably capable, lots of memory and SSD drives.

vrurg commented 11 months ago

What stage does it hang in? Is it testing? If so, would it be possible for you to clone the repo and try prove6 -I. -j1 to see what's the last successful test.

Overall, Test::Async does a lot of testing, some tests are really slow. I'm not mentioning Apple Silicon M2 here, but my Linux box is being ran by Intel Xeon E5-2690 v4, 2.6GHz. Full test run takes ~2min.

tbrowder commented 11 months ago

Yes, I will do that.

UPDATE

I ran

prove6 -I. -j1

It hung at test 150. I killed it after 3 minutes.

I ran it again and noted it took about 35 seconds to get to test 150.

Want me to wait longer? My OS is Debian 11.

vrurg commented 11 months ago

Is 150 the last visible? Then it means that 160 is the culprit. Either way, can you try running both manually? raku -I. t/<test-name>. Perhaps it would help to pinpoint the exact location where it happens.

Testing can be really slow at some moments. First due to Test::Async is generally slower than the standard Test (concurrency support is costly). Second, because 190, for example, is a probability test which does start thousands of threads.

Another point, if you have STRESS_TESTING environment variable set, 170-heavily-concurrent.rakutest could be ran which might be slow as well.

But with prove6 -j1 would either 170 or 190 be somehow involved the last passed tests would be 160 and 180 respectively. So, currently I'm totally in limbo as to what's going on.

vrurg commented 11 months ago

Oh, and another crazy thought just've crossed my mind. What version does it try to install?

As to the repo clone, try git pull. I must not be the case, but the main branch could have lagged behind v0.1, where I was doing the latest changes.

tbrowder commented 11 months ago

Okey dokey, wilco.

tbrowder commented 11 months ago

First I resynched my fork of Test::Async on Github. Then pulled my local clone of the main branch.

I tried all tests individually starting with 150, which tested okay. 160 hung as you said, printed the 1..5. I waited 2+ minutes and killed it. Set "export STRESS_TESTING=1" and ran 170, it took just a few seconds. Rest of the tests all ran quickly.

I saw no errors but now I'm running "zef test --debug ."

Still hung after 150 with no message.

Any more ideas? Any system package I might be missing?

vrurg commented 11 months ago

There is nothing in 160 to make it that slow. So, there is either a deadlock, or deep recursion. What version of Rakudo do you have? Otherwise I currently have no slightest idea of what it might be.

Things I'd try to do are all about just running the test manually with various debugging measures. First of all, it'd make sense to disable all :parallel and :random adverbs in the test itself. This should make pinpointing of the exact subtest that fails possible. When this is known then I could only think of the normal debugging procedures with debug prints and all sorts of techniques we have.

tbrowder commented 11 months ago

Raku v2023.10.

tbrowder commented 11 months ago

Testing 160 alone:

I first tried commenting out all :random and :parallel adverbs, all okay.

Then turned on all :random, all okay.

Then I turned on all individual test/subtest :parallel adverbs, all okay.

Then I turned on the universal :parallel at the top (all other :parallel off), halt again.

tbrowder commented 11 months ago

Weird. My host has 16 cores. When RAKUDO_MAX_THREADS=16, test 160 halts. When I set it to ANY other value, no halts! That is a holdover env var from a long time ago. I'm going check my other host. That env var has the same setting, but I don't remember the number of cores.

tbrowder commented 11 months ago

That host has 8 cores. Max threads are 16. Halts at test 160. Set max theads at 64, test passes!

vrurg commented 11 months ago

What if you set the max to 8? My guess would it would be freezing again.

At least, it gives me a clue. A deadlock is possible if there are threads awaiting for an event (say, a kept promise) in blocked state (happens, for example, inside a lock), but the one to actually keep the promise can't be started because the thread pool is already exhausted.

vrurg commented 10 months ago

The last release should've solved this issue.