jackaudio / jack2

jack2 codebase
GNU General Public License v2.0
2.22k stars 376 forks source link

Ringbuffers are not thread safe #388

Closed nicholashaydensmith closed 5 years ago

nicholashaydensmith commented 6 years ago

The ringbuffer code is not thread safe because variables write_ptr and read_ptr are read from and written to by different threads. Not only are the size_t variables themselves subject to data hazard, the compiler and processor are free to reorder any of the operations inside of jack_ringbuffer_read and jack_ringbuffer_write because there is zero fencing. Consider the following scenario:

Producer thread: calls jack_ringbuffer_write and updates write_ptr before we actually memcpy any data and gets context switched out. Consumer thread: calls jack_ringbuffer_read and thinks there is data to be consumed when in fact the producer thread hasn't actually written anything to memory.

The volatile keyword does not make anything threadsafe and shouldn't be used in this context. volatile is really only useful for memory mapped IO and is never appropriate to use as a means for inter-thread communication.

Instead of using volatile, the write_ptr and read_ptr variables should be made atomic and use proper memory order semantics to ensure thread safety, https://en.cppreference.com/w/cpp/atomic/memory_order.

99.9% of the time I bet the code, as it stands, works perfectly fine so I understand this is probably not a super high priority issue. Just thought I'd raise this issue for correctness purposes. I would submit a pull request, but it would violate the terms and agreement with my current employer.

falkTX commented 6 years ago

I am aware of this "issue", but tbh it seems all just theory. Before trying to fix this, we need to have a test case that proves the bad behavior.

kmatheussen commented 6 years ago

Hopefully it's only in theory that it would misbehave, at least on intel CPUs. But it's undoubtedly a real problem is if you use tsan, so this must be fixed.

Here's a disussion about it from a couple of years ago: http://linux-audio.4202.n7.nabble.com/Realtime-inter-thread-communication-td99157.html

I did supply a patch for the ringbuffer in that thread, but I've lost access to the account where it was uploaded. Anyway, it's an easy fix.

kmatheussen commented 6 years ago

The volatile keyword does not make anything threadsafe and shouldn't be used in this context. volatile is really only useful for memory mapped IO and is never appropriate to use as a means for inter-thread communication.

Actually, the microsoft c compiler guarantee instructions on volatile variables not to be reordered: https://preshing.com/20141024/my-multicore-talk-at-cppcon-2014/ https://msdn.microsoft.com/en-us/library/12a04hfd.aspx Seems like Microsoft did the right thing there.

jcelerier commented 6 years ago

Before trying to fix this, we need to have a test case that proves the bad behavior.

I had similar code in my codebase that worked fine and then failed when compiling at high optimisation levels with LTO, since the compiler was able to "assume" that because there is no actual protection, then writing from two threads would be undefined behaviour, and since undefined behaviour cannot ever happen from the point of view of the compiler, the only logical alternative was to assume that everything would happen on a single thread.

Actual writes were hence eschewed and everything would stay in CPU caches for who knows how long (in practice write synchronization didn't happen at all from what I remember). Using atomic\<T> instead of volatile T solved it.

falkTX commented 6 years ago

Can you create a similar case in some small code?

In any case, multiple writers to a single ringbuffer does not sound like a good idea to me, even with memory fences and other techniques.

jcelerier commented 6 years ago

the simplest example code that reproduces this kind of problems would be this :

#include <thread>
#include <iostream>
int main()
{
    int i = 0;
    volatile int* p = &i;

    std::thread t([p] { 
        (*p)++;
    });

    while(i == 0) ;

    std::cerr << i << std::endl;   

    t.join();
}

for me with gcc 8, it works fine at -O0 / -O1 but stays in an infinite loop at -O2 / -O3

falkTX commented 6 years ago

I know about that case, the optimization makes the final binary skip the while loop. The "issue" there is the local variable.

A ringbuffer is quite different, the class variables in there are not local scope variables like in the example.

I am yet to see working (proof of mis-write/read) test code against it.

jpcima commented 5 years ago

I've tried porting the robustness check from my own ring-buffer library to test. I send a sequence of integers and I check on the receiving side.

Apparently, I am able to reproduce a data race problem. Without a forced delay following read or write, I have a failure and a report by thread sanitizer.

test_case.cc.gz test_case.txt.gz

Hope this helps.

jackdmp version 1.9.12 tmpdir /dev/shm protocol 8

7890 commented 5 years ago

Hi, I think you are using the ringbuffer in a wrong way.

When you use jack_ringbuffer_write() and it was successful, there's no need to call jack_ringbuffer_write_advance(). The latter is only useful when the pointers were retrieved via jack_ringbuffer_get_write_vector().

Now if you use jack_ringbuffer_write() and count wasn't equal, you'll already have written a portion of data to your buffer and have messed up the event boundaries from there on.

I've slightly modified your testcase:

test_case.cc.txt

It runs without errors with delay=0. Greetings

jpcima commented 5 years ago

Tried and you're right @7890, it was a misuse and the test case seems to work robustly now. While thread sanitizer still emits its complaint, it ran without error on my side.

7890 commented 5 years ago

Excellent :)

7890 commented 5 years ago

Please re-open if this poses an "real" issue with a reproducible case.

Krasjet commented 3 years ago

Hey, I want to give some updates about this issue. I am able to consistently reproduce this issue on a raspberry pi 4 using the test cases from https://github.com/jackaudio/jack2/issues/388#issuecomment-453283106. It appears that the relaxed memory ordering on ARM platforms does pose a problem to the current implementation.

I wonder if anyone is able to reproduce this issue on Apple Silicon?

Krasjet commented 3 years ago

Here is a sample output:

...
---------------- 418756/1000000 ----------------
---------------- 418757/1000000 ----------------
message (0) != expected (82)

I have deleted all the print statements between each iteration to speed up the testing.

Krasjet commented 3 years ago

Here is the test program I'm using, with some minor cleanups

//          Copyright Jean Pierre Cimalando 2018.
// Distributed under the Boost Software License, Version 1.0.
//    (See accompanying file LICENSE or copy at
//          http://www.boost.org/LICENSE_1_0.txt)

// g++ -O2 -o test_case test_case.cc -ljack -lpthread

#include <jack/ringbuffer.h>
#include <memory>
#include <thread>
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>

jack_ringbuffer_t *ring_buffer;
size_t capacity = 1024;
size_t message_count = 100;
size_t message_size = 100;

void thread1()
{
  jack_ringbuffer_t *rb = ::ring_buffer;
  const size_t message_size = ::message_size;
  const size_t message_count = ::message_count;
  std::unique_ptr<uint8_t[]> message_buffer(new uint8_t[message_size]);

  for (size_t i = 0; i < message_count;) {
    *(size_t *)message_buffer.get() = i;

    size_t can_write=jack_ringbuffer_write_space(rb);
    size_t count=0;
    if (can_write>=message_size) {
      count = jack_ringbuffer_write(rb, (const char *)message_buffer.get(), message_size);
    }
    if (count == message_size)
      ++i;
    else
      sched_yield();
  }
}

void thread2()
{
  jack_ringbuffer_t *rb = ::ring_buffer;
  const size_t message_size = ::message_size;
  const size_t message_count = ::message_count;
  std::unique_ptr<uint8_t[]> message_buffer(new uint8_t[message_size]);

  for (size_t i = 0; i < message_count;) {
    size_t can_read=jack_ringbuffer_read_space(rb);
    size_t count=0;
    if (can_read>=message_size) {
      count = jack_ringbuffer_read(rb, (char *)message_buffer.get(), message_size);
    }

    if (count == message_size) {
      size_t msg = *(size_t *)message_buffer.get();
      if (msg != i) {
        printf("message (%zu) != expected (%zu)\n", msg, i);
        exit(1);
      }
      ++i;
    }
  }
}

int main()
{
  size_t ntimes = 1000000;
  for (size_t i = 0; i < ntimes; ++i)
  {
    printf("---------------- %4zu/%4zu ----------------\n", i + 1, ntimes);
    jack_ringbuffer_t *rb = ::ring_buffer = jack_ringbuffer_create(::capacity);
    std::thread t1(thread1);
    std::thread t2(thread2);
    t1.join();
    t2.join();
    ::ring_buffer = nullptr;
    delete rb;
  }
  printf("-------------------------------------------\n");
  printf("success!\n");

  return 0;
}