jetmonk / cl-libserialport

A Common Lisp interface to the libserialport serial port library from sigrok
MIT License
5 stars 0 forks source link

Gets unresponsive when running long term #2

Open mdbergmann opened 2 years ago

mdbergmann commented 2 years ago

Hi.

I use(d) libserialport to query a service repeatedly every 30 seconds. Running this for days made the lib freeze after ~2 days. I don't have details why, but maybe there is a memory leak? I have also tried cserial-port, which is more rudimentary, but this has no problems running longer. So it has to be something in cl-libserialport or libserialport that makes it refuse to work long term.

Regards

jetmonk commented 2 years ago

I assume you mean a memory leak in the foreign heap, not accumulating un'gced things on the lisp heap.

You probably know that serial ports are not gc'ed automatically, and you are probably using one open serial port. Otherwise, one needs to use (shutdown-serial-port) to free the foreign buffers, and (destroy-event-set) if using event-sets.

Could you describe your use in some more detail? Do you have some minimal test case? What machine are you using (eg RPi might run out of memory sooner, and it might be harder to replicate on a bigger machine)? Are you using event-sets?

I think the only foreign objects allocated are serial-port-port-ptr (the foreign pointer to the allocated port) and serial-port-fbuf (a foreign buffer), and these are de-allocated in (shutdown-serial-port). If you're using one serial-port object the whole time, then they should be allocated once.

The macro with-serial-port-buffer sometimes allocates a larger buffer, but it is inside (cff:with-foreign-object ...), which allocates memory with a dynamic extent (stack if possible). How big are your data reads/writes ? Perhaps I should not be using cffi:with-foreign-object if the buffer is too big for the stack.

The foreign function malloc_stats() might tell you if the foreign heap is shrinking, so it might be useful to make it print every hour.

If you're interested, you could try swapping the following definition into libserialport.lisp, which allocates the oversized temporary buffer on the heap instead of the stack, but it has a chance of making a difference only if your reads/writes are > +serial-buff-size+ = 256.

(defmacro with-serial-port-buffer ((varname serial-port nbytes) &body body)
  (let ((body-func-name (gensym "WITH-SERIAL-PORT-BUFFER-BODY"))
    (tmpbufname (gensym "WITH-SERIAL-PORT-BUFFER-TMP"))
    (nbytes-name (gensym "NBYTES")))
    ;; use fixed buffer, or a larger buffer if needed
    `(let ((,nbytes-name ,nbytes))
       (flet ((,body-func-name (,varname)
        ,@body))
     (if (<= ,nbytes-name +serial-buff-size+)
         (,body-func-name (serial-port-fbuf ,serial-port))
         (progn ;; nbytes too big, so allocate on foreign heap
           (let ((,tmpbufname nil))
         (unwind-protect
              (progn
            (setf ,tmpbufname (cffi:foreign-alloc :unsigned-char :count ,nbytes-name))
            (,body-func-name ,tmpbufname))
           (progn
             (cffi:foreign-free ,tmpbufname))))))))))
mdbergmann commented 2 years ago

I'm not sure. I didn't look throught the code.

But in a nutshell, you have opened a serial device once, then in a loop you do:

(libserialport:serial-read-octets-until
   port
   #\}
   :timeout timeout)

and it'll break eventually.

I've tried without the timeout, doing a sleep, but that didn't help.

jetmonk commented 2 years ago

In read-octets (called by serial-read-octets-until) I discovered a non-local exit from a BLOCK containing a cffi dynamic memory allocation. So I put the memory allocation put outside the BLOCK, and pushed the change.

This might have caused issues in a Lisp that doesn't stack-allocate this memory, if the non-local exit prevents a foreign-free. But I don't think this is the issue because it just reads one byte per call to read-octets, so it uses the internal buffer and does't allocate a larger temporary foreign buffer. And I suspect cffi would have put an unwind-protect in a non-stack-allocated case, handling cleanup in a non-local exit.

If you decide to try it again, please let me know what happens. Also, what lisp and machine are you using? This is tested on SBCL on OSX (but none of it is OSX specific).

mdbergmann commented 2 years ago

I'd retry. I could replicate that on CCL v1.5 on an older Mac OSX 10.5, and on SBCL 2.1 on arm64 on a Pi4.