rust-lang / libs-team

The home of the library team
Apache License 2.0
127 stars 19 forks source link

Add Special Thread Builder Functions for the Nintendo 3DS #71

Closed AzureMarker closed 1 year ago

AzureMarker commented 2 years ago

Proposal

Problem statement

(some background first)

The Nintendo 3DS is supported as a Tier 3 target (armv6k-nintendo-3ds). Recently, basic std support was merged in https://github.com/rust-lang/rust/pull/95897. There is now an open PR to enable std::thread support: https://github.com/rust-lang/rust/pull/98514.

The Nintendo 3DS has a few physical cores that can be used by user applications (2-4 depending on the model^1). One of these cores is preemptive (primarily used by the OS, named system core or syscore), and the rest are cooperative (known as app cores). Threads live on the core they were created on, and don't move across cores. See https://www.3dbrew.org/wiki/Multi-threading.

Now coming to the real problem statement: The API used to spawn threads on the 3DS requires the core/processor where the thread will live, as well as the thread priority. The processor ID can't be changed after the thread has been created[^2], though the priority technically can be (but it should be set at thread creation time). The current Rust thread builder API doesn't provide a way to set the processor ID or priority during thread creation.

[^2]: Technically there are APIs which seem to provide this functionality (setting/getting thread core affinity, fields in the kernel's thread representation), but they don't seem to work. Either way, threads don't seem to automatically load balance.

Motivation, use-cases

Setting the processor ID is important to 3DS applications. To use the system core, which is preemptive, the correct processor ID needs to be passed during thread creation. To spawn threads on a different core than the main thread, the program needs a way to set the processor ID to use.

There is a processor ID value that tells the 3DS to spawn the thread on the "default" core for the application, but if we use this value, we limit the application to just one core (and it can't use the preemptive system core).

Setting the priority is also important to 3DS applications. Since most of the cores are cooperative, the priority value tells the 3DS what order to executes threads in when a yield happens. Having the wrong priority value could mean lower performance (less opportunities to run the thread). There is an API to get the current thread's priority, which can be used to set new threads to the current thread's priority, so this isn't as big of an issue as the processor ID section. However, since the priority is passed in a thread creation we should have an API to let the user take advantage of this.

Solution sketches

To work around the missing APIs for setting processor ID and priority during thread creation, the std::thread PR adds a 3DS target specific extension trait std::os::horizon::thread::BuilderExt behind a feature flag which adds support for setting the processor ID and priority:

/// Extensions on [`std::thread::Builder`] for the Nintendo 3DS.
///
/// This trait is sealed: it cannot be implemented outside the standard library.
/// This is so that future additional methods are not breaking changes.
///
/// [`std::thread::Builder`]: crate::thread::Builder
pub trait BuilderExt: Sized + Sealed {
    /// Sets the priority level for the new thread.
    ///
    /// Low values gives the thread higher priority. For userland apps, this has
    /// to be within the range of 0x18 to 0x3F inclusive. The main thread
    /// usually has a priority of 0x30, but not always.
    fn priority(self, priority: i32) -> Self;

    /// Sets the ID of the processor the thread should be run on. Threads on the
    /// 3DS are only preemptive if they are on the system core. Otherwise they
    /// are cooperative (must yield to let other threads run).
    ///
    /// Processor IDs are labeled starting from 0. On Old3DS it must be <2, and
    /// on New3DS it must be <4. Pass -1 to execute the thread on all CPUs and
    /// -2 to execute the thread on the default CPU (set in the application's
    /// Exheader).
    ///
    /// * Processor #0 is the application core. It is always possible to create
    ///   a thread on this core.
    /// * Processor #1 is the system core. If the CPU time limit is set, it is
    ///   possible to create a single thread on this core.
    /// * Processor #2 is New3DS exclusive. Normal applications can create
    ///   threads on this core only if the built application has proper external setup.
    /// * Processor #3 is New3DS exclusive. Normal applications cannot create
    ///   threads on this core.
    fn processor_id(self, processor_id: i32) -> Self;
}

To implement this, the Builder struct gets a new crate-private native_options field whose type is platform specific:

// thread/mod.rs
pub struct Builder {
    // [..]

    // Other OS-specific fields. These can be set via extension traits, usually
    // found in std::os.
    pub(crate) native_options: imp::BuilderOptions,
}

// sys/unix/thread.rs
pub struct BuilderOptions {
    /// The spawned thread's priority value
    #[cfg(target_os = "horizon")]
    pub(crate) priority: Option<i32>,
    /// The processor to spawn the thread on. See [`os::horizon::thread::BuilderExt`].
    #[cfg(target_os = "horizon")]
    pub(crate) processor_id: Option<i32>,
}

// other platforms
pub struct BuilderOptions;

// os/horizon/thread.rs
impl BuilderExt for crate::thread::Builder {
    fn priority(mut self, priority: i32) -> Self {
        self.native_options.priority = Some(priority);
        self
    }

    fn processor_id(mut self, processor_id: i32) -> Self {
        self.native_options.processor_id = Some(processor_id);
        self
    }
}

Since this is a crate/std private field, there is no user facing change to the thread builder outside of the 3DS target. The native_options fields are used during thread creation, for example in sys/unix/thread.rs Thread::new:

impl Thread {
    pub unsafe fn new(
        stack: usize,
        p: Box<dyn FnOnce()>,
        #[allow(unused)] native_options: BuilderOptions,
    ) -> io::Result<Thread> {
        // [..]
    }
}

Alternatively, we could add this API to all platforms, but I don't think most platforms let you set things like the processor ID during thread creation, so this alternative solution probably wouldn't work.

Links and related work

std::thread PR for the 3DS, which implements the proposed solution behind a feature flag: https://github.com/rust-lang/rust/pull/98514

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

joshtriplett commented 2 years ago

I'd like to propose changing this in one of two ways. One would be simple but non-portable, the other has the possibility of being a portable API useful in many cases.

The simple approach: if we add these in their current form, we should name them something Horizon-specific, so that their names don't conflict with other methods we might add for portable versions of comparable concepts. For instance, horizon_processor_id.

However, the portable version I'd love to see:

Could we design an abstraction for setting thread (and process, though Horizon may not have that) affinity? On Horizon, the only valid affinities would be single-CPU affinities, Horizon would define constants for those single CPUs, and the default affinity if not set could be the "default core" mentioned above. Other systems would support multi-CPU affinities, and the default would be "all CPUs".

Similarly, we could have an abstraction for setting thread (and process) priority. The valid range of priority values, and the default, would need to be target-specific, though we could provide constants for them. (Things like "realtime priority" or "idle priority" might need to be target-specific extensions, but we could map this to nice-level on POSIX.)

ian-h-chamberlain commented 2 years ago

However, the portable version I'd love to see:

Could we design an abstraction for setting thread (and process, though Horizon may not have that) affinity? On Horizon, the only valid affinities would be single-CPU affinities, Horizon would define constants for those single CPUs, and the default affinity if not set could be the "default core" mentioned above. Other systems would support multi-CPU affinities, and the default would be "all CPUs".

Similarly, we could have an abstraction for setting thread (and process) priority. The valid range of priority values, and the default, would need to be target-specific, though we could provide constants for them. (Things like "realtime priority" or "idle priority" might need to be target-specific extensions, but we could map this to nice-level on POSIX.)

@joshtriplett what do you think about this alternative design? Nothing set in stone, but kind of a sketch of the more portable API I think you had in mind. @AzureMarker @Meziu also curious for your feedback on this as well:

Public APIs

std::thread

/// CPU priority of a thread. Use `std::os::*::thread` to create a Priority.
pub struct Priority(pub(crate) imp::Priority);

/// CPU affinity (which CPU to run on) for a thread.
/// Use `std::os::*::thread` to create Affinity.
pub struct Affinity(pub(crate) imp::Affinity);

pub struct Builder {
    // ...
    priority: Option<Priority>,
    affinity: Option<Affinity>,
}

impl Builder {
    /// Set the priority of the thread to be spawned.
    pub fn priority(mut self, priority: Priority) -> Self;

    /// Set the affinity of the thread to be spawned.
    pub fn affinity(mut self, affinity: Affinity) -> Self;

    fn spawn_unchecked(...) -> io::Result<JoinInner<T>> {
        // ...
        Ok(JoinInner {
            native: imp::Thread::new(
                // ...
                self.priority,
                self.affinity,
            )?
        })
    }
}

A possible future extension, if the platforms support it. I think this could simply return Err on platforms where spawn-time settings are supported but runtime settings aren't:

impl JoinHandle {
    /// Sets the priority of a running thread.
    /// Returns Err on unsupported platforms.
    pub fn set_priority(mut self, priority: Priority) -> io::Result<()>;

    /// Sets the affinity of a running thread.
    /// Returns Err on unsupported platforms.
    pub fn set_affinity(mut self, priority: Affinity) -> io::Result<()>;
}

std::os::horizon::thread

pub use crate::thread::{Affinity, Priority};

use crate::sys::unix::thread as imp;

// Checks for valid priority values, or initializes `imp::Priority` struct, etc.
// Could in theory take different args on different platforms, I guess?
pub fn priority(priority: i32) -> Option<Priority>;

// This could return a `Priority` instead, but that might make
// it tricky to do math, etc. on it (maybe still possible with extension traits?).
// This API might be more flexible anyway since it will be OS-dependent, but if it 
// were to be in `std::thread` then it would have to return a `Priority`, and maybe would
// be a method on `std::thread::Thread`.
pub fn current_priority() -> i32;

// Checks for valid processor ID, etc.
pub fn affinity(processor_id: i32) -> Option<Affinity>;

// predefined constants with special OS-specific meaning. These *could* be in
// `std::thread`, if they are universal enough, but would probably need a lot
// more `#[cfg]`.
pub const ALL_CPUS: Affinity = Affinity(imp::Affinity(-1));
pub const DEFAULT_CPU: Affinity = Affinity(imp::Affinity(-2));

std::os::*

Since there is no way to construct a Priority or Affinity on these platforms, it's impossible to use the API until it's implemented.

Internal implementation (imp)

std::sys::unix::thread

#[cfg(target_os = "horizon")]
pub struct Priority(pub libc::c_int); // or maybe even type alias, etc., as needed
#[cfg(target_os = "horizon")]
pub struct Affinity(pub libc::c_int);

#[cfg(not(target_os = "horizon"))]
pub struct Priority(());
#[cfg(not(target_os = "horizon"))]
pub struct Affinity(());

impl Thread {
    pub unsafe fn new(
        ...,
        // these two could also be wrapped in a struct like NativeOptions
        priority: Option<Priority>,
        affinity: Option<Affinity>,
    ) -> io::Result<Thread> {
        // make pthread_setschedparam calls, etc. to set priority/affinity.
    }
}

std::sys::* (can be updated as implemented)

pub struct Priority(());
pub struct Affinity(());

impl Thread {
    pub unsafe fn new(
        ...,
        _: Priority,
        _: Affinity,
    ) -> io::Result<Thread>;
}

I think this satisfies the use case we have for Horizon, and also would be portable enough to extend to other platforms as needed (nice values, whatever Windows might have, etc.) Is it worth pursuing a functional implementation of this with Horizon as the guinea pig (since https://github.com/rust-lang/rust/pull/98514 already has much of the logic in place to use this)?

AzureMarker commented 2 years ago

That looks like a good start to me. The only concerns I have are:

  1. The std::os::horizon::thread function names could be bikeshed. The conversion functions like priority seem odd, maybe instead we can use TryFrom?
  2. It seems like an anti-pattern to have OS-specific details in std::sys::unix::thread. I know the Priority and Affinity types are defined there so std::thread can refer to them through imp, but maybe there's a better way.
  3. Without a NativeOptions-like struct, it looks really hard for another OS to pass through a third parameter. But if we keep the NativeOptions struct they can just cfg in a new field. Also, it only increases the number of params by 1 instead of 2 (or N).
joshtriplett commented 2 years ago

This design looks largely reasonable to me.

I'd love to see this extended to Linux, using CPU sets, to make sure that the interface works just as well for affinities consisting of multiple CPUs.

I agree that the OS-specific details shouldn't be in unix; ideally they'd largely be in the os-specific module.

@AzureMarker I don't think this should use TryFrom impls. Not least of which because those can't be scoped, so they'd always be available on Horizon, but not necessarily on other OSes.

ian-h-chamberlain commented 2 years ago

I agree that the OS-specific details shouldn't be in unix; ideally they'd largely be in the os-specific module.

The main reason I had it like this was that the existing sys::unix::thread (for example) module doesn't seem to have any OS-specific submodules, but just uses #[cfg] inline to switch between implementations of things when working with threads. Do you think it would make sense to move some of that stuff into sys::unix::os instead then, or would we be creating a new set of sys::unix::thread::{} modules for OS-specific logic? Or do you mean move the functionality entirely into std::os::*::thread somehow?

I'd love to see this extended to Linux, using CPU sets, to make sure that the interface works just as well for affinities consisting of multiple CPUs.

I think this is possible, but I can work on a proof of concept to show that it actually is viable this way.


Regarding TryFrom, it might be feasible to have something like this:

std::os::horizon::thread

use crate::sys::unix::thread as imp;

pub struct Priority(imp::Priority);

impl TryFrom<i32> for Priority {}

impl From<Priority> for crate::thread::Priority {}

The downside would be another layer of indirection (now we have os::*::thread::Priority, sys::*::thread::Priority, and std::thread::Priority), but I think it's hard to avoid without either exposing sys::*::thread::Priority or having std::thread depend on types in std::os::*, neither of which seems desirable at first glance...

I suppose to support CPU sets etc., this kind of thing might be necessary anyway?

ian-h-chamberlain commented 2 years ago

I've opened a draft roughly implementing some of my proposed changes in https://github.com/rust-lang/rust/pull/101222

I ended up just making sys::* types aliases of os::* types as that seemed like the least annoying for converting between types.

There's a lot of cleanup and stuff that would be needed if this proposal is accepted, but I think this shows that it can work at least for these platforms. It might be good to get some opinions from people who are more familiar with other platforms like Windows to make sure this would be feasible for those as well?

AzureMarker commented 1 year ago

Closing in favor of #195