rust3ds / pthread-3ds

PThread implementation for Nintendo 3DS Horizon OS targets. Keep in mind that Horizon OS uses a cooperative, and not preemptive, threading model.
Apache License 2.0
12 stars 7 forks source link

Recent changes seem to cause a deadlock when thread spawns #30

Closed ian-h-chamberlain closed 4 months ago

ian-h-chamberlain commented 4 months ago

I noticed this while trying to make some other changes to ctru-rs — I think because we are pulling in the new pthread-3ds changes, something in the test runner isn't working anymore and the tests hang forever, e.g. https://github.com/rust3ds/ctru-rs/actions/runs/9009948051/job/24755098299?pr=187

Comparing to the most recent successful run, this is the diff of compiler output:

4a5
> Compiling bindgen v0.65.1
7a9
> Compiling cc v1.0.97
14a17
> Compiling doxygen-rs v0.4.2
20a24
> Compiling futures-macro v0.3.30
38a43,44
> Compiling phf v0.11.2
> Compiling phf_macros v0.11.2
41a48,49
> Compiling prettyplease v0.2.20
> Compiling proc-macro2 v1.0.82
43c51,52
< Compiling pthread-3ds v0.1.0 (https://github.com/rust3ds/pthread-3ds.git#c885d8cd)
---
> Compiling pthread-3ds v0.1.0 (https://github.com/rust3ds/pthread-3ds.git#b15d26df)
> Compiling quote v1.0.36
47a57
> Compiling serde v1.0.201
55a66
> Compiling syn v2.0.61
63a75,76
> Compiling tokio-macros v2.2.0
> Compiling toml v0.5.11

Most of the other changes, as far as I can tell, are just build dependencies and I wouldn't expect them to affect the unit tests, so pthread having recent changes is my prime suspect.

Maybe #19 had some side effect on spawning new threads somehow? Otherwise, I'm not sure what might have changed, but this seems like somewhere to start.

Meanwhile, I'm going to try checking in Cargo.lock with the old hash (c885d8cd) and see if that helps at all. If not, we can close this issue / move it to ctru-rs to try and track it down there.

ian-h-chamberlain commented 4 months ago

Ok, pinning the older hash passed CI again, so I think that's pretty strong evidence it wasn't from something else unrelated: https://github.com/rust3ds/ctru-rs/actions/runs/9010075712/job/24755466567?pr=187

Meziu commented 4 months ago

A deadlock it seems indeed. It's very weird though, the code pushed in #19 should only have effect once a thread is detached (which may still be the case for when a test ends its run).

I want to note two things:

  1. It might be related to the spinlock that handles the TLS variables, but it's hard for me to understand why the TLS variables would cause a deadlock (there shouldn't be anything used?).
  2. The tests are run "concurrently" by default, but it's a bit hard to say whether they should. Tests aren't written with cooperative concurrency in mind, so they end up running sequentially either way.
ian-h-chamberlain commented 4 months ago

Hmm, initial thoughts from looking at the changes:

  1. If any of the destructors call some pthread_key_* function, the KEYS spinlock could be held already and deadlock would occur. Maybe it would be safer to collect all the destructor pointers first, then release the lock before actually calling the destructors?
  2. I think the test runner should be avoiding concurrency entirely since it sets the number of threads to 1: https://github.com/rust3ds/ctru-rs/blob/master/test-runner/src/lib.rs#L56 It's possible that there are still some issues that could arise from having the main thread waiting on the individual test thread, but I think it's necessary to handle panics appropriately during tests and stuff like that. Without writing a whole new test runner from scratching I'm not sure how much we can do about this in particular (and up until now, it's seemed to work okay?)