bpecsek commented 3 years ago

(define-compiler-macro ,name (arg &rest more-args)
           (cond ((null more-args)
                  `(,',two-arg-fn ,',neutral-element (,',simd-type ,(first args))));;; should be (,',simd-type ,arg)
                 (t (reduce
                     (lambda (a b) `(,',two-arg-fn a (,',simd-type ,b)));;;; should be ,a or (,',simd-type ,a)
                     args;;;;;;;;; should be more-args
                     :initial-value `(,',simd-type ,arg)))))

compilation policy using env might be beneficial to use

I propose, as a quick fix the following:

(defmacro define-nary-wrapper* (name simd-type two-arg-fn neutral-element)
  (export name)
  (if (and (value-record-name-p simd-type)
           (instruction-record-name-p two-arg-fn))
      `(progn
         (define-inline ,name (arg &rest more-args)
           (if (null more-args) (,two-arg-fn ,neutral-element (,simd-type arg))
               (let ((acc (,simd-type arg)))
                 (declare (,simd-type acc))
                 (loop for arg in more-args do
                   (setf acc (,two-arg-fn acc (,simd-type arg))))
                 acc)))
         (define-compiler-macro ,name (arg &rest more-args &environment env)
           (if (> (sb-c::policy-quality (slot-value env 'sb-c::%policy) 'speed)
                  (sb-c::policy-quality (slot-value env 'sb-c::%policy) 'space))
               (cond ((null more-args)
                      `(,',two-arg-fn ,',neutral-element
                                      (,',simd-type ,arg)))
                 (t (reduce
                     (lambda (a b) `(,',two-arg-fn (,',simd-type ,a)
                                                   (,',simd-type ,b)))
                     more-args
                     :initial-value `(,',simd-type ,arg))))
               )))
      `(defun ,name (arg &rest more-args)
         (declare (ignore arg more-args))
         (error "The function ~S is not available on this platform."
                ',name))))

I also would like to add u64.2, u64.4, u32.4 and u32.8 instructions in tables.lisp and wrappers.lisp.

Missing integer types should added in SBCL first to enable us to implement it.

I also have a few higher lavel functions that I van contribute.

marcoheisig commented 3 years ago

(I just formatted the code snippets you posted - turns out I am allowed to edit your comments)

Now I recall - I lost my motivation last year because I realized I need to do some SBCL hacking. But maybe I should just bite the bullet and do that.

I will have a close look at this right after ELS.

bpecsek commented 3 years ago

Hi Marco,

I am doing some benchmarking and your f64.4-row-major-aref seems to be much faster then the f64.4-refs I have defined.

However, for some reason the f64.4-aref are very slow.

Unfortunately, I can't make the (setf (f64.4-aref u 0) (make-f64.4 1 2 3 4)) to work

It says: The function SB-SIMD::F64.4-HAIRY-DATA-VECTOR-SET is undefined. [Condition of type UNDEFINED-FUNCTION] and I cant figure out what's wrong.

If I do (setf (f64.4-row-major-aref v 0) (make-f64.4 1 2 3 4)) I get The value

<SB-EXT:SIMD-PACK-256 3FF0000000000000 4000000000000000 4008000000000000 4010000000000000>

is not of type DOUBLE-FLOAT when setting an element of (ARRAY DOUBLE-FLOAT) [Condition of type TYPE-ERROR]

I would appreciate your help when you are back from ELS.

marcoheisig commented 3 years ago

Hi Bela,

aref is quite a bit more complicated that row-major-aref, so I am not surprised there are issues. But I will look into it tomorrow.

And thank you very much for your recent changes - I already had a look and intend to merge them tomorrow.

bpecsek commented 3 years ago

Hi Marco,

Thanks for merging my codes.

I am just wondering though that we are using the clearest naming convention.

I was looking earlier at what Shubhamkar is using in Numerical and also Agner Fog's great Vec library and just wondering if it wouldn't be clearer to use something similar to make the library easier for someone coming from C/C++ world.

Something like: 16c, 16uc, 8s, 8us, 4i, 4ui, 2q, 2qi for SSE 32c, 32uc, 16s, 16s ....etc fro AVX or the other way around like c16, uc16, s8, 8us, f4, d2, f8, d4, etc like Shubhamkar use in numerical

So something like this (though I can also see that advantage of f32.4 projecting more information so it might be better to keep it): f4 for f32.4 d2 for f64.2 f8 for f64.8 d4 for f64.4

The proposed one is also shorter. I don't know. Just a thought now in this early stage.

marcoheisig commented 3 years ago

You are correct, it would be nice to have a naming convention that resembles other libraries. But my reasoning for introducing new names is that Common Lisp offers much more flexibility when naming things. Common Lisp symbols can include hyphens, dots, and whatnot. So it would be a shame to restrict sb-simd to names that are compatible with C. I am very happy with the current naming convention because it makes both the element type, the number of values in the pack, and the number of bits very clear. Any new convention should be similarly clear and consistent before I'd consider a switch.

The thing that worries me much more than the names of the packs is how to generalize this library to ARM and RISCV. These architectures have an instruction set that mostly avoids specifying a fixed vector length. I'm not sure how to handle that gracefully. But then again, I haven't given it much thought, yet.

But thanks for sharing your thoughts (and your code)! You have already motivated to put additional effort into this library. I just fixed the completely broken arithmetic operations on integers, and I am trying to fix the SIGILL on non-AVX2 systems. I'll push these changes soon.

bpecsek commented 3 years ago

Hi Marco,

Actually, I have had some more thought after I'd written my last comment and and I'm with you on the naming.

It does indeed give more insight that should be worthwhile.

Next, wee need the missing signed integer support in SBCL.

I have contacted Stas already and also tried to add at least signed 32 and 64 bit integers but no luck yet. I am not comfortable enough into SBCL internals yet.

marcoheisig commented 3 years ago

Fun fact - I wanted to add the remaining integer types to SBCL myself, but I ran out of free time at that point. And maybe going directly after the additional integer types was not that smart anyway. I think finishing the SIMD instructions for floats will already make a lot of people extremely happy.

Right now, I am splitting the instruction table into one table per instruction set (SSE, SSE2, AVX, AVX2, ...). That makes it easier to selectively enable only those instructions that are actually available. I'll keep you updated.

By the way, have you read my other remarks on your pull request (https://github.com/marcoheisig/sb-simd/pull/5#issuecomment-835890925)? I'd like to hear what you think about these points.

bpecsek commented 3 years ago

I think it is a good call. Not yet.

bpecsek commented 3 years ago

One of the huge advantages for integers would be in image processing. Having 16 or 32 8-bit integers to represent rgb colors and process them parallel would be a major win for sure. Looking forward to seeing your commits with the fixies soon. Thanks

bpecsek commented 3 years ago

Hi Marco,

Could you please check when you have the time why the (setf (f64.4-row-major-aref .. and (setf (f64.4-aref fails

I couldn't make them to work

When I do (setf (f64.4-aref u 0) (make-f64.4 1 2 3 4)) I get

The function SB-SIMD::F64.4-HAIRY-DATA-VECTOR-SET is undefined. [Condition of type UNDEFINED-FUNCTION] and I cant figure out what's wrong.

When I do (setf (f64.4-row-major-aref v 0) (make-f64.4 1 2 3 4)) I get The value

<SB-EXT:SIMD-PACK-256 3FF0000000000000 4000000000000000 4008000000000000 4010000000000000>

is not of type DOUBLE-FLOAT when setting an element of (ARRAY DOUBLE-FLOAT) [Condition of type TYPE-ERROR]

Thanks

marcoheisig commented 3 years ago

Progress:

I just finished separating the instructions into separate tables for each instruction set (979d89b7a85b05c8c9ede0dc3ad781326d0126be).
I found (and fixed) a bug on platforms that don't support AVX2: https://bugs.launchpad.net/sbcl/+bug/1928097
I found some erroneous instruction definitions. In particular, the handling of integer SIMD packs is a mess.

Now to answer your question: SBCL calls XYZ-hairy-data-vector-set whenever it prove that the supplied array is a simple array with known element type. And we simply haven't implemented XYZ-hairy-data-vector-set, yet. Also your other problem is most certainly a stupid bug in the way we handle arrays. That part of the code really needs some love. Maybe I can tackle that tomorrow.

marcoheisig commented 3 years ago

The sb-simd package should also export the remaining symbols now. Let me know if I missed some.

bpecsek commented 3 years ago

Thanks Marco. The X.Y-ref's are not exported either. Export is missing from aref.lisp. My bad. Please also note that as far as I know the movshdup, movsldup and movddup are sse3 instructions not ssse3.

marcoheisig commented 3 years ago

Aref is next on my list, don't worry.

And thanks for spotting the sse3<->ssse3 confusion.

bpecsek commented 3 years ago

Hi Marco,

I am trying to define left and right shift operators (u32.4-shiftl u64.4-shiftr or u32.4-ash to make it more lispy or whatever) but I can't make them to work. They are the AVX vpslldq vpsrldq and SSE2 pslldq psrldq 's but I don't know what should be the second argument record type for them. I have tries u16 s16 u8 s8 etc the lot but it won't compile.

Any clue?

marcoheisig commented 3 years ago

I just defined a bunch of shift instructions in b2f7d61a2359012444a0b7aaa1b8f23b816a03b1.

The trouble is that these instructions require a second argument that is also a simd pack. I also played around with scalar types like u16, s16, u8 etc., and figured out that I was declaring the wrong primitive type for them. That should be fixed, too.

I also made sure that AVX instructions replace SSE instructions of the same name if and only if they are supported by the host. So ideally, systems with AVX should never emit SSE instructions and we avoid most of the context switching vzeroupper mess.

marcoheisig / sb-simd

bug in define-nary-wrapper* macro in define-compiler-macro part #4

<SB-EXT:SIMD-PACK-256 3FF0000000000000 4000000000000000 4008000000000000 4010000000000000>

<SB-EXT:SIMD-PACK-256 3FF0000000000000 4000000000000000 4008000000000000 4010000000000000>