Closed ian-h-chamberlain closed 2 years ago
I think I have an alternative design for keeping track of these PThread objects which would fix the leak issue...
@ian-h-chamberlain Idea: Do something similar to the thread local keys implementation by having a RwLock<BTreeMap<u32, PThread>, spin::Yield>
and set pthread_t
to u32
. Then store the u32
for that thread in a thread local (to make pthread_self
work). The u32
values can come from an AtomicUsize
like in the thread keys impl.
Edit: could hold the current thread's key in a Cell
This introduces some synchronization, but as long as we drop the locks at the right times (we can make Pthread
Copy
and drop right after we read the map) I don't think it will need to wait for locks much at all.
Edit: by the way, we also should think about running the thread local destructors on thread deletion...
Update: I tried a variant of the transmute for stuffing the whole struct into an int again, and it seems to pass Miri validation now? I don't think I was using Box
before for the pointer field, so that might have been the real issue.
This seems kinda gnarly but I think it avoids the leaking issue, and shouldn't require any allocation or whatever. I still haven't been able to find any resources that definitively declare this as undefined behavior, and Miri accepts it, so maybe it's okay?
I am a bit hesitant to use thread locals and stuff like that since it feels like reimplementing what's already there in libctru, so if we can use their impl instead I think that would be preferable.
Re: destructors – yes, that's a good point and I will see if I can do that as part of my next push, seems fairly straightforward, I think.
Side note: from reading comments on https://github.com/rust-lang/rust/issues/29594 and https://github.com/rust-lang/rust/issues/91659 – does #[thread_local]
actually do anything on this platform? I didn't have a chance to look at disassembly or anything so I'm wondering if it actually compiles down as expected. If so, we should probably mark the target as #[cfg(target_thread_local)]
in rustc.
Side note: from reading comments on rust-lang/rust#29594 and rust-lang/rust#91659 – does
#[thread_local]
actually do anything on this platform? I didn't have a chance to look at disassembly or anything so I'm wondering if it actually compiles down as expected. If so, we should probably mark the target as#[cfg(target_thread_local)]
in rustc.
I am pretty sure it does work (or at least, the thread_local!
macro does). We had problems with thread storage in the past, but it should work now.
I am pretty sure it does work (or at least, the
thread_local!
macro does). We had problems with thread storage in the past, but it should work now.
My understanding, though limited, is that thread_local!
uses emulation on unsupported targets (i.e. calling the pthread_key_*
etc. functions implemented in this crate), but uses #[thread_local]
in its implementation on supported targets. #[thread_local]
seems to lower to an LLVM attribute or something like that on those targets, but I'm not sure how we can tell if that's supported on our target, other than perhaps comparing disassembled code or something like that. Perhaps @AzureMarker knows more from working on the thread local implementation...
I am pretty sure it does work (or at least, the
thread_local!
macro does). We had problems with thread storage in the past, but it should work now.My understanding, though limited, is that
thread_local!
uses emulation on unsupported targets (i.e. calling thepthread_key_*
etc. functions implemented in this crate), but uses#[thread_local]
in its implementation on supported targets.#[thread_local]
seems to lower to an LLVM attribute or something like that on those targets, but I'm not sure how we can tell if that's supported on our target, other than perhaps comparing disassembled code or something like that. Perhaps @AzureMarker knows more from working on the thread local implementation...
There is the has_thread_local
option for the target spec, which I also remember using at one point. I don’t know if that changes anything though, or it it is just a target flag for conditional code.
Edit: we are using a raw #[thread_local]
call for the implementation of keys itself https://github.com/Meziu/pthread-3ds/blob/90bd4d00e371ea5dbd8487c5a63d3d75d667f796/src/lib.rs#L689
So I guess it works(?)
Yeah I'm pretty sure that annotation does work. I think the thread local example verifies that.
If it's acceptable to use a bigger type for pthread_t then that might be fine. Safety wise it's ok as long as we make that static size assertion.
If it's acceptable to use a bigger type for pthread_t then that might be fine. Safety wise it's ok as long as we make that static size assertion.
I honestly would like not changing the pthread_t
size. Aren't pointers on 3DS 32bit anyways? It should fit in the ulong
type.
Yes, it would require making pthread_t
larger than a pointer... There seems to be some precedent (based on this pthread header, I think) for something like this although most pthread_t
definitions I found in libc are simply a pointer type, c_ulong
, or similar, not a c_longlong
or equivalent (which is what we would need in this case).
As far as I know pthread_t
is meant to be treated as an opaque type to any callers, so I think any of these options are probably fine, but if there are serious reservations about increasing the size then I suppose we would need to do something like the thread local keys.
While it is true it is meant as an opaque type, of which one shouldn’t assume the size or value, it also isn’t meant to be a storage. Under normal circumstances it is either an unique index or a pointer. I believe we should try to follow that rule in any case. Especially because we wouldn’t be limited by the amount of additional data we can store. The only real issue are memory leaks.
Ok, I've come up with something that I think addresses the leak, without adding much complexity or changing the pointer size. This does a little duplication of the PThread
struct, but I think it should be cheap enough and does get properly destructed on thread exit, with the exception of the main thread.
Also added some basic code to run thread-local destructors, in a similar manner to THREAD_SELF
.
I took a quick glance but I'll review more soon. Very interesting approach. (I finally got my big desktop working again, after 2 months. PSU and GPU died over the holidays. Now I have 32 cores to compile with instead of my laptop's 4 :smile:)
I took a quick glance but I'll review more soon. Very interesting approach. (I finally got my big desktop working again, after 2 months. PSU and GPU died over the holidays. Now I have 32 cores to compile with instead of my laptop's 4 smile)
I have a 5th gen i5 and with integrated graphics. My laptop dies if I compile Rust while having an open VsCode window (though that may be more of a problem with VsCode rather than Rust).
Edit: One day I will upgrade, maybe.
Ok, based on the discussion and some local testing, I think it's fair to say this approach doesn't quite work in every scenario.
I won't have much time in the next two weeks to try the suggested approach (still traveling) so I am going to close the PR for now. I can revisit when I get back from my trip, or if someone else wants to take a stab at it in the meantime, then feel free. The BTreeMap
design seems like it should work similar to how thread-locals do, so that's probably worth pursuing.
Thanks for working on this and experimenting @ian-h-chamberlain! I'll take a stab at it and see how things go.
https://github.com/Meziu/ctru-rs/issues/48
This allows us to hold the OS thread ID as well as the ctru
Thread
, which makes lookups for schedparam work a bit nicer. One downside is thatpthread_self
leaks the struct when called...I considered trying to stuff the whole
PThread
struct into thepthread_t
(if it were ac_ulonglong
instead ofc_ulong
), but it requiresmem::transmute
, and I tried it in Miri and it was flagged as Undefined Behavior (and also didn't seem to work), so ... probably not the best idea.Open to ideas for avoiding the leak, although arguably leaking a 64-bit struct once in a while is not so bad compared to the zombie processing getting left around by
svcGetThreadList
.@Meziu @AzureMarker