rust-lang / libs-team

The home of the library team
Apache License 2.0
116 stars 18 forks source link

Make safe most intrinsics that neither access memory nor impact processor state #243

Closed dead-claudia closed 1 year ago

dead-claudia commented 1 year ago

Proposal

Problem statement

Currently, any time you want to use vector intrinsics directly, you have to resort to unsafe blocks. As that's still very much unstable, it's not exactly accessible for most currently. Also, there's features that doesn't cover, like AVX-512's result masking (which saves a lot of instructions in some niche cases).

Motivating examples or use cases

I've got this code laying around in an experiment:

use std::arch::x86_64::*;

#[repr(C)]
#[derive(Clone, Copy)]
pub struct Color(__m128);

impl Color {
  #[inline]
  pub fn new(a: f32, r: f32, g: f32, b: f32) -> Color {
    unsafe { Color(_mm_setr_ps(a, r, g, b)) }
  }

  #[inline]
  pub fn a(self) -> f32 {
    f32::from_bits(unsafe { _mm_extract_ps::<0>(self.0) as u32 })
  }

  #[inline]
  pub fn r(self) -> f32 {
    f32::from_bits(unsafe { _mm_extract_ps::<1>(self.0) as u32 })
  }

  #[inline]
  pub fn g(self) -> f32 {
    f32::from_bits(unsafe { _mm_extract_ps::<2>(self.0) as u32 })
  }

  #[inline]
  pub fn b(self) -> f32 {
    f32::from_bits(unsafe { _mm_extract_ps::<3>(self.0) as u32 })
  }
}

#[inline]
fn with_alpha(d: __m128, s: __m128) -> __m128 {
  unsafe { _mm_blend_ps::<0b0001>(d, s) }
}

#[derive(Clone, Copy)]
pub struct Composite {
  a1: f32,
  a2: f32,
  b1: f32,
  b2: f32,
}

fn color_clamp(c: Color) -> Color {
  Color(unsafe { _mm_max_ps(_mm_set1_ps(0.0), _mm_min_ps(_mm_set1_ps(1.0), c.0)) })
}

impl Composite {
  pub fn compile(data: u8) -> Option<Composite> {
    fn extract(data: u8, offset_right: i8) -> Option<f32> {
      let value = ((data as i8) << offset_right) >> 6;
      (value != -2).then_some(value as f32)
    }

    Some(Composite {
      a1: extract(data, 6)?,
      a2: extract(data, 4)?,
      b1: extract(data, 2)?,
      b2: extract(data, 0)?,
    })
  }

  pub fn execute(&self, d: &mut Color, s: Color) {
    unsafe {
      let fr = s.a() * d.a() * (self.a2 as f32) + (self.a1 as f32);
      let fr = with_alpha(_mm_set1_ps(fr), _mm_set1_ps(1.0));
      let fd = d.a() * s.a() * (self.b2 as f32) + (self.b1 as f32);
      let fd = with_alpha(_mm_set1_ps(fd), _mm_set1_ps(1.0 - s.a()));
      *d = color_clamp(Color(_mm_add_ps(_mm_mul_ps(d.0, fd), _mm_mul_ps(fr, s.0))));
    }
  }
}

None of those unsafe blocks are truly dealing with anything unsafe as per the book, the unsafe code guidelines, or the reference manual.

Solution sketch

For each architecture intrinsics that neither read nor modify memory or persistent processor state, make it safe and wrap the inner contents with an unsafe block as needed.

For x86-64, this equates to roughly the following:

I may have missed a thing or two - I didn't scan core::arch::x86_64 too closely.

Alternatives

Do nothing and just focus on std::simd. This is workable, but see my note on AVX-512's result masking for why this isn't helpful in of itself. (I didn't include an example for that here, but it wouldn't be hard for me to provide one.)

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

BurntSushi commented 1 year ago

I don't see anything addressing the safety requirements as documented in core::arch: https://doc.rust-lang.org/core/arch/index.html#overview

All of these are already guarded with #[target_feature] as appropriate, thus avoiding undefined behavior on that front.

This seems quite confused. Firstly, I don't see any uses of #[target_feature] in your code sample (other than calling vendor intrinsics that are annotated with it). Secondly, and more importantly, #[target_feature] doesn't discharge safety requirements, it introduces them. You can't annotate a safe function with #[target_feature]. It has to be marked unsafe. And that has nothing to do with "access memory nor impact processor state." It's marked unsafe because it's undefined behavior to execute code that isn't supported by the CPU. In the best case you'll get SIGILL. This was explicitly discussed in the RFC that introduced #[target_feature]. So if you want to make #[target_feature] safe to use, you have to address what the RFC said by either not making it undefined behavior (which I would assume would require figuring it out at the code generator level, i.e., LLVM) or finding some other way to mitigate it.

dead-claudia commented 1 year ago

I don't see anything addressing the safety requirements as documented in core::arch: doc.rust-lang.org/core/arch/index.html#overview

All of these are already guarded with #[target_feature] as appropriate, thus avoiding undefined behavior on that front.

This seems quite confused. Firstly, I don't see any uses of #[target_feature] in your code sample (other than calling vendor intrinsics that are annotated with it).

@BurntSushi Apologies for the (very) sloppy imprecision here. I meant that they're only accessible when you either specify #[target_feature] or explicitly opt into them through -C target-cpu/-C target-features. And I was proposing making them safe when using the latter.

The way the rest of your stuff reads, I'd be better served making an RFC instead for this as there's lower-level language kinks to work out, so I'll close this ACP.

BurntSushi commented 1 year ago

I meant that they're only accessible when you either specify #[target_feature] or explicitly opt into them through -C target-cpu/-C target-features. And I was proposing making them safe when using the latter.

I would definitely suggest a pre-RFC first. Firstly because I still don't quite understand these sentences. Secondly, because compile time CPU features is somewhat less compelling (although perhaps that's changing with microarchitecture levels). Thirdly, because safe_arch exists, which looks like important prior art here.