Closed nicholashaydensmith closed 5 years ago
I am aware of this "issue", but tbh it seems all just theory. Before trying to fix this, we need to have a test case that proves the bad behavior.
Hopefully it's only in theory that it would misbehave, at least on intel CPUs. But it's undoubtedly a real problem is if you use tsan, so this must be fixed.
Here's a disussion about it from a couple of years ago: http://linux-audio.4202.n7.nabble.com/Realtime-inter-thread-communication-td99157.html
I did supply a patch for the ringbuffer in that thread, but I've lost access to the account where it was uploaded. Anyway, it's an easy fix.
The volatile keyword does not make anything threadsafe and shouldn't be used in this context. volatile is really only useful for memory mapped IO and is never appropriate to use as a means for inter-thread communication.
Actually, the microsoft c compiler guarantee instructions on volatile variables not to be reordered: https://preshing.com/20141024/my-multicore-talk-at-cppcon-2014/ https://msdn.microsoft.com/en-us/library/12a04hfd.aspx Seems like Microsoft did the right thing there.
Before trying to fix this, we need to have a test case that proves the bad behavior.
I had similar code in my codebase that worked fine and then failed when compiling at high optimisation levels with LTO, since the compiler was able to "assume" that because there is no actual protection, then writing from two threads would be undefined behaviour, and since undefined behaviour cannot ever happen from the point of view of the compiler, the only logical alternative was to assume that everything would happen on a single thread.
Actual writes were hence eschewed and everything would stay in CPU caches for who knows how long (in practice write synchronization didn't happen at all from what I remember). Using atomic\<T> instead of volatile T solved it.
Can you create a similar case in some small code?
In any case, multiple writers to a single ringbuffer does not sound like a good idea to me, even with memory fences and other techniques.
the simplest example code that reproduces this kind of problems would be this :
#include <thread>
#include <iostream>
int main()
{
int i = 0;
volatile int* p = &i;
std::thread t([p] {
(*p)++;
});
while(i == 0) ;
std::cerr << i << std::endl;
t.join();
}
for me with gcc 8, it works fine at -O0 / -O1 but stays in an infinite loop at -O2 / -O3
I know about that case, the optimization makes the final binary skip the while loop. The "issue" there is the local variable.
A ringbuffer is quite different, the class variables in there are not local scope variables like in the example.
I am yet to see working (proof of mis-write/read) test code against it.
I've tried porting the robustness check from my own ring-buffer library to test. I send a sequence of integers and I check on the receiving side.
Apparently, I am able to reproduce a data race problem. Without a forced delay following read or write, I have a failure and a report by thread sanitizer.
test_case.cc.gz test_case.txt.gz
Hope this helps.
jackdmp version 1.9.12 tmpdir /dev/shm protocol 8
Hi, I think you are using the ringbuffer in a wrong way.
When you use jack_ringbuffer_write() and it was successful, there's no need to call jack_ringbuffer_write_advance(). The latter is only useful when the pointers were retrieved via jack_ringbuffer_get_write_vector().
Now if you use jack_ringbuffer_write() and count wasn't equal, you'll already have written a portion of data to your buffer and have messed up the event boundaries from there on.
I've slightly modified your testcase:
It runs without errors with delay=0. Greetings
Tried and you're right @7890, it was a misuse and the test case seems to work robustly now. While thread sanitizer still emits its complaint, it ran without error on my side.
Excellent :)
Please re-open if this poses an "real" issue with a reproducible case.
Hey, I want to give some updates about this issue. I am able to consistently reproduce this issue on a raspberry pi 4 using the test cases from https://github.com/jackaudio/jack2/issues/388#issuecomment-453283106. It appears that the relaxed memory ordering on ARM platforms does pose a problem to the current implementation.
I wonder if anyone is able to reproduce this issue on Apple Silicon?
Here is a sample output:
...
---------------- 418756/1000000 ----------------
---------------- 418757/1000000 ----------------
message (0) != expected (82)
I have deleted all the print statements between each iteration to speed up the testing.
Here is the test program I'm using, with some minor cleanups
// Copyright Jean Pierre Cimalando 2018.
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE or copy at
// http://www.boost.org/LICENSE_1_0.txt)
// g++ -O2 -o test_case test_case.cc -ljack -lpthread
#include <jack/ringbuffer.h>
#include <memory>
#include <thread>
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
jack_ringbuffer_t *ring_buffer;
size_t capacity = 1024;
size_t message_count = 100;
size_t message_size = 100;
void thread1()
{
jack_ringbuffer_t *rb = ::ring_buffer;
const size_t message_size = ::message_size;
const size_t message_count = ::message_count;
std::unique_ptr<uint8_t[]> message_buffer(new uint8_t[message_size]);
for (size_t i = 0; i < message_count;) {
*(size_t *)message_buffer.get() = i;
size_t can_write=jack_ringbuffer_write_space(rb);
size_t count=0;
if (can_write>=message_size) {
count = jack_ringbuffer_write(rb, (const char *)message_buffer.get(), message_size);
}
if (count == message_size)
++i;
else
sched_yield();
}
}
void thread2()
{
jack_ringbuffer_t *rb = ::ring_buffer;
const size_t message_size = ::message_size;
const size_t message_count = ::message_count;
std::unique_ptr<uint8_t[]> message_buffer(new uint8_t[message_size]);
for (size_t i = 0; i < message_count;) {
size_t can_read=jack_ringbuffer_read_space(rb);
size_t count=0;
if (can_read>=message_size) {
count = jack_ringbuffer_read(rb, (char *)message_buffer.get(), message_size);
}
if (count == message_size) {
size_t msg = *(size_t *)message_buffer.get();
if (msg != i) {
printf("message (%zu) != expected (%zu)\n", msg, i);
exit(1);
}
++i;
}
}
}
int main()
{
size_t ntimes = 1000000;
for (size_t i = 0; i < ntimes; ++i)
{
printf("---------------- %4zu/%4zu ----------------\n", i + 1, ntimes);
jack_ringbuffer_t *rb = ::ring_buffer = jack_ringbuffer_create(::capacity);
std::thread t1(thread1);
std::thread t2(thread2);
t1.join();
t2.join();
::ring_buffer = nullptr;
delete rb;
}
printf("-------------------------------------------\n");
printf("success!\n");
return 0;
}
The ringbuffer code is not thread safe because variables
write_ptr
andread_ptr
are read from and written to by different threads. Not only are thesize_t
variables themselves subject to data hazard, the compiler and processor are free to reorder any of the operations inside ofjack_ringbuffer_read
andjack_ringbuffer_write
because there is zero fencing. Consider the following scenario:Producer thread: calls
jack_ringbuffer_write
and updateswrite_ptr
before we actually memcpy any data and gets context switched out. Consumer thread: callsjack_ringbuffer_read
and thinks there is data to be consumed when in fact the producer thread hasn't actually written anything to memory.The
volatile
keyword does not make anything threadsafe and shouldn't be used in this context.volatile
is really only useful for memory mapped IO and is never appropriate to use as a means for inter-thread communication.Instead of using
volatile
, thewrite_ptr
andread_ptr
variables should be made atomic and use proper memory order semantics to ensure thread safety, https://en.cppreference.com/w/cpp/atomic/memory_order.99.9% of the time I bet the code, as it stands, works perfectly fine so I understand this is probably not a super high priority issue. Just thought I'd raise this issue for correctness purposes. I would submit a pull request, but it would violate the terms and agreement with my current employer.