rust3ds / pthread-3ds

PThread implementation for Nintendo 3DS Horizon OS targets. Keep in mind that Horizon OS uses a cooperative, and not preemptive, threading model.
Apache License 2.0
12 stars 7 forks source link

Use struct pointer as pthread_t #13

Closed ian-h-chamberlain closed 2 years ago

ian-h-chamberlain commented 2 years ago

https://github.com/Meziu/ctru-rs/issues/48

This allows us to hold the OS thread ID as well as the ctru Thread, which makes lookups for schedparam work a bit nicer. One downside is that pthread_self leaks the struct when called...

I considered trying to stuff the whole PThread struct into the pthread_t (if it were a c_ulonglong instead of c_ulong), but it requires mem::transmute, and I tried it in Miri and it was flagged as Undefined Behavior (and also didn't seem to work), so ... probably not the best idea.

Open to ideas for avoiding the leak, although arguably leaking a 64-bit struct once in a while is not so bad compared to the zombie processing getting left around by svcGetThreadList.

@Meziu @AzureMarker

AzureMarker commented 2 years ago

I think I have an alternative design for keeping track of these PThread objects which would fix the leak issue...

AzureMarker commented 2 years ago

@ian-h-chamberlain Idea: Do something similar to the thread local keys implementation by having a RwLock<BTreeMap<u32, PThread>, spin::Yield> and set pthread_t to u32. Then store the u32 for that thread in a thread local (to make pthread_self work). The u32 values can come from an AtomicUsize like in the thread keys impl.

Edit: could hold the current thread's key in a Cell

This introduces some synchronization, but as long as we drop the locks at the right times (we can make Pthread Copy and drop right after we read the map) I don't think it will need to wait for locks much at all.

Edit: by the way, we also should think about running the thread local destructors on thread deletion...

ian-h-chamberlain commented 2 years ago

Update: I tried a variant of the transmute for stuffing the whole struct into an int again, and it seems to pass Miri validation now? I don't think I was using Box before for the pointer field, so that might have been the real issue.

This seems kinda gnarly but I think it avoids the leaking issue, and shouldn't require any allocation or whatever. I still haven't been able to find any resources that definitively declare this as undefined behavior, and Miri accepts it, so maybe it's okay?

I am a bit hesitant to use thread locals and stuff like that since it feels like reimplementing what's already there in libctru, so if we can use their impl instead I think that would be preferable.

Re: destructors – yes, that's a good point and I will see if I can do that as part of my next push, seems fairly straightforward, I think.


Side note: from reading comments on https://github.com/rust-lang/rust/issues/29594 and https://github.com/rust-lang/rust/issues/91659 – does #[thread_local] actually do anything on this platform? I didn't have a chance to look at disassembly or anything so I'm wondering if it actually compiles down as expected. If so, we should probably mark the target as #[cfg(target_thread_local)] in rustc.

Meziu commented 2 years ago

Side note: from reading comments on rust-lang/rust#29594 and rust-lang/rust#91659 – does #[thread_local] actually do anything on this platform? I didn't have a chance to look at disassembly or anything so I'm wondering if it actually compiles down as expected. If so, we should probably mark the target as #[cfg(target_thread_local)] in rustc.

I am pretty sure it does work (or at least, the thread_local! macro does). We had problems with thread storage in the past, but it should work now.

ian-h-chamberlain commented 2 years ago

I am pretty sure it does work (or at least, the thread_local! macro does). We had problems with thread storage in the past, but it should work now.

My understanding, though limited, is that thread_local! uses emulation on unsupported targets (i.e. calling the pthread_key_* etc. functions implemented in this crate), but uses #[thread_local] in its implementation on supported targets. #[thread_local] seems to lower to an LLVM attribute or something like that on those targets, but I'm not sure how we can tell if that's supported on our target, other than perhaps comparing disassembled code or something like that. Perhaps @AzureMarker knows more from working on the thread local implementation...

Meziu commented 2 years ago

I am pretty sure it does work (or at least, the thread_local! macro does). We had problems with thread storage in the past, but it should work now.

My understanding, though limited, is that thread_local! uses emulation on unsupported targets (i.e. calling the pthread_key_* etc. functions implemented in this crate), but uses #[thread_local] in its implementation on supported targets. #[thread_local] seems to lower to an LLVM attribute or something like that on those targets, but I'm not sure how we can tell if that's supported on our target, other than perhaps comparing disassembled code or something like that. Perhaps @AzureMarker knows more from working on the thread local implementation...

There is the has_thread_local option for the target spec, which I also remember using at one point. I don’t know if that changes anything though, or it it is just a target flag for conditional code.

Edit: we are using a raw #[thread_local] call for the implementation of keys itself https://github.com/Meziu/pthread-3ds/blob/90bd4d00e371ea5dbd8487c5a63d3d75d667f796/src/lib.rs#L689 So I guess it works(?)

AzureMarker commented 2 years ago

Yeah I'm pretty sure that annotation does work. I think the thread local example verifies that.

AzureMarker commented 2 years ago

If it's acceptable to use a bigger type for pthread_t then that might be fine. Safety wise it's ok as long as we make that static size assertion.

Meziu commented 2 years ago

If it's acceptable to use a bigger type for pthread_t then that might be fine. Safety wise it's ok as long as we make that static size assertion.

I honestly would like not changing the pthread_t size. Aren't pointers on 3DS 32bit anyways? It should fit in the ulong type.

ian-h-chamberlain commented 2 years ago

Yes, it would require making pthread_t larger than a pointer... There seems to be some precedent (based on this pthread header, I think) for something like this although most pthread_t definitions I found in libc are simply a pointer type, c_ulong, or similar, not a c_longlong or equivalent (which is what we would need in this case).

As far as I know pthread_t is meant to be treated as an opaque type to any callers, so I think any of these options are probably fine, but if there are serious reservations about increasing the size then I suppose we would need to do something like the thread local keys.

Meziu commented 2 years ago

While it is true it is meant as an opaque type, of which one shouldn’t assume the size or value, it also isn’t meant to be a storage. Under normal circumstances it is either an unique index or a pointer. I believe we should try to follow that rule in any case. Especially because we wouldn’t be limited by the amount of additional data we can store. The only real issue are memory leaks.

ian-h-chamberlain commented 2 years ago

Ok, I've come up with something that I think addresses the leak, without adding much complexity or changing the pointer size. This does a little duplication of the PThread struct, but I think it should be cheap enough and does get properly destructed on thread exit, with the exception of the main thread.

Also added some basic code to run thread-local destructors, in a similar manner to THREAD_SELF.

AzureMarker commented 2 years ago

I took a quick glance but I'll review more soon. Very interesting approach. (I finally got my big desktop working again, after 2 months. PSU and GPU died over the holidays. Now I have 32 cores to compile with instead of my laptop's 4 :smile:)

Meziu commented 2 years ago

I took a quick glance but I'll review more soon. Very interesting approach. (I finally got my big desktop working again, after 2 months. PSU and GPU died over the holidays. Now I have 32 cores to compile with instead of my laptop's 4 smile)

I have a 5th gen i5 and with integrated graphics. My laptop dies if I compile Rust while having an open VsCode window (though that may be more of a problem with VsCode rather than Rust).

Edit: One day I will upgrade, maybe.

ian-h-chamberlain commented 2 years ago

Ok, based on the discussion and some local testing, I think it's fair to say this approach doesn't quite work in every scenario.

I won't have much time in the next two weeks to try the suggested approach (still traveling) so I am going to close the PR for now. I can revisit when I get back from my trip, or if someone else wants to take a stab at it in the meantime, then feel free. The BTreeMap design seems like it should work similar to how thread-locals do, so that's probably worth pursuing.

AzureMarker commented 2 years ago

Thanks for working on this and experimenting @ian-h-chamberlain! I'll take a stab at it and see how things go.