Code size may be an issue, at least on ARM SBCL

stacksmith commented 7 years ago

It appears that cl-autowrap and plus-c accessors are compiled pretty inefficently on SBCL. Memory is an issue on small arm machines, premature optimization aside

A simple test (perhaps not the most revealing,but...) shows a factor of 4 code size increase vs. baseline cffi:mem-ref and allocation. No optimization was attempted other than fixnum'ing the return values when requested by SBCL.

This is not a complaint, just a heads up.

; 696 bytes 
(defun case1 ()
  (cffi:with-foreign-objects ((width :uint32) (height :uint32))
    (graphics-get-display-size 0 width height)
    (values (cffi:mem-ref width :uint32)
        (cffi:mem-ref height :uint32))))

;; 416 bytes
(defun case1o ()
  (declare (optimize (speed 3) (safety 0) (debug 0)))
  (cffi:with-foreign-objects ((width :uint32) (height :uint32))
    (graphics-get-display-size 0  width height)
    (values (the fixnum (cffi:mem-ref width :uint32))
        (the fixnum (cffi:mem-ref height :uint32)))))

;; 2312 bytes
(defun case2 ()
     (c-with ((width uint32) (height uint32))
    (graphics-get-display-size 0 (width &) (height &))
    (values width height)))
;; 2296 bytes
(defun case2o ()
  (declare (optimize (speed 3) (safety 0) (debug 0)))
  (c-with ((width uint32) (height uint32))
    (graphics-get-display-size 0 (width &) (height &))
    (values width height)))

;; 2012 bytes
(defun case3 ()
  (with-many-alloc ((width :int) (height :int))
    (graphics-get-display-size 0 width height)
    (values (c-ref width :int) (c-ref height :int))))

;; 1652 bytes
(defun case3o ()
   (declare (optimize (speed 3) (safety 0) (debug 0)))
  (with-many-alloc ((width :int) (height :int))
    (graphics-get-display-size 0 width height)
    (values  (the fixnum (c-ref width :int))
         (the fixnum (c-ref height :int)))))

rpav commented 7 years ago

It's certainly possible, most of the refs are likely inlined pointer math. There may be other (undesirable) inlining as well for wrappers where applicable.

Recently I reverted non-variadic functions to be actual lisp functions as well, where I believe CFFI still expands functions inline. I'm not sure how/if that would impact this, but you might want to check without the function call and see how that impacts things (if at all).

However, the worse news is that this might pale in relation to the retained FFI data loaded from the spec. Both can probably be optimized in theory, but could very well take a fair bit of work.

I'd actually imagine fixing this should be easier if the cause is found along with a viable solution. It's odd the c-with vs c-ref differ so much... I thought they were ultimately the same. I'll have to take a look at the expansions again.

stacksmith commented 7 years ago

It seems that it's mostly allocators and accessors. For instance, in case3, removing the actual function call changes the size from 2012 bytes to 1876. The CFFI allocation, deallocation and conversion of 2 ints is 368 bytes, both with the usual function overhead of course.

I haven't had a chance to look at x86 code - SBCL is probably more optimized there. And I've embraced Lisp's lack of interest in things such as code size given that even small machines sport a gig. I do come from homespun Forth-like compilers, where I'd be annoyed if the similar code took more than 20 bytes.

rpav commented 7 years ago

I did take a glance at the disassembly on x86_64 and it's similar .. but I'm not really sure what's going on, either. My knowledge of SBCL internals is obviously extremely limited.

rpav / cl-autowrap

Code size may be an issue, at least on ARM SBCL #81