rigtorp / SPSCQueue

A bounded single-producer single-consumer wait-free and lock-free queue written in C++11
MIT License
907 stars 129 forks source link

Assertion failed #35

Open armintoepfer opened 2 years ago

armintoepfer commented 2 years ago

Hi!

I've been using SPSCQueue for some time and just updated to the latest version and now hit this:

SPSCQueue.hpp:188: void Rigtorp::SPSCQueue<T, Allocator>::pop() [with T = PacBio::CCS::Chunk; Allocator = std::allocator<PacBio::CCS::Chunk>]: Assertion `writeIdx_.load(std::memory_order_acquire) != readIdx' failed.

Any advice? Thank you!

galchinsky commented 2 years ago

To reproduce:

#include <complex>
#include <array>
#include "SPSCQueue.h"
int main() {
    rigtorp::SPSCQueue< int > queue(16);
    for (int j = 0; j < 6; ++j) {
        queue.emplace(0);
    }
    for (int j = 0; j < 3; ++j) {
        queue.pop();
    }
}

If I dump indexes after each operation this way

template<typename T>
void dump(T& q) {
    std::cout << q.writeIdx_ << " " << q.writeIdxCache_ << " " << q.readIdx_ << " " << q.readIdxCache_ << std::endl;
}

It is clear this is caused by the diverged non-atomic caches and the real values:

1 0 0 0 # push
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
6 0 1 0 # pop
6 0 2 0
6 0 3 0
6 0 3 0

If ensure the equality between the caches and the indexes in the end of the pop call, the problem disappears. Sure it's just a dirty fix because I wouldn't like to dig into the logic.

    readIdxCache_ = nextReadIdx;
    writeIdxCache_ = writeIdx_;
Philippe91 commented 2 years ago

@galchinsky On which CPU are you running? For example, Apple Silicon CPU behaves a bit differently than Intel, as I could experiment (SPSCQueue was not involved).

galchinsky commented 2 years ago

@Philippe91 it's a simple single-threaded example with a few push and a few pop operations, I don't think CPU can make a difference here.

usefulcat commented 1 year ago

I submitted a PR to fix this, see PR "Bug fix for #35".

Philippe91 commented 1 year ago

I submitted a PR to fix this, see PR "Bug fix for #35".

This solves the problem, apparently.

This being said, the logic behind the cached indexes system introduced by @rigtorp is challenging to grasp in my mind. It gives me an insecure feeling, reinforced by the presence of this bug, introduced 18 months ago, and to which the author has never reacted since it was reported ten months ago.

I am considering reverting to the implementation prior 3a0507a193b0573d87823759d5c8c1230598bfbc.

galchinsky commented 1 year ago

I just moved to Facebook's queue, because who knows are there any other bugs like that

jz315 commented 1 year ago

I have the same problem.

#include <iostream>
#include "SPSCQueue.h"
using namespace rigtorp;
int main(int argc, char* argv[]) {
    SPSCQueue<int> q(10000);

    q.push(1);
    q.pop();

    return 0;
}

Output: Assertion failed: writeIdx_.load(std::memory_order_acquire) != readIdx, file D:\Programs\Test\test1\SPSCQueue.h, line 184

If add q.front(); before q.pop();, the code will run properly.

jz315 commented 1 year ago

And function front() has some bug.

RIGTORP_NODISCARD T* front() noexcept {
            auto const readIdx = readIdx_.load(std::memory_order_relaxed);
            if (readIdx == writeIdxCache_) {
                writeIdxCache_ = writeIdx_.load(std::memory_order_acquire);
                if (writeIdxCache_ == readIdx) {
                    return nullptr;
                }
            }
            return &slots_[readIdx + kPadding];
        }

If you run the pop() function first, readIdx_ will change from 0 to 1, while the initial value of writeIdxCache_ is 0, so they will be unequal and writeIdxCache_ will not update until readIdx_ reaches the end of the array and becomes 0 again. During this time, the front() function will always return an object regardless of whether the queue is empty.

vetribox commented 1 year ago

It's just an assertion fail in the class destructor because writeIdxCache_ value doesn't reflect writeIdx_ as it is only updated in the *front() member function when writeIdxCache_ == readIdx_.load(std::memory_order_relaxed) and only if it is so writeIdxCache_ gets updated, hence the fault. The sane fix is to check if size() > 0 in the class destructor loop.

Because you weren't supposed to call *front() when the size() is zero and that's what the class destructor does, it failed to check bounds as it should be.

rigtorp commented 1 year ago

There seems to be a misunderstanding of the pre-conditions of calling pop(). I updated the documentation here: https://github.com/rigtorp/SPSCQueue/commit/40661138be2245d7d52854e4fb3b7f6bb4bf02ad

This is a concurrent queue intended to be used by two threads. In that case the only way for the reader thread to know if there is more data to read is by calling front(). Only once it returns a non-nullptr can you call pop().

This code invokes undefined behavior (and triggers the assert):

SPSCQueue<int> q(10000);
q.push(1);
q.pop();

This code correctly checks that the queue is not empty:

SPSCQueue<int> q(10000);
q.push(1);
if (q.front()) {
   q.pop();
}

Since you are not reading front(), you are not communicating any data to the other thread, so I don't understand why you even need the queue?

galchinsky commented 1 year ago

I don't remember the details (even where the assertion got triggered), but here are 3 chunks of code I had before moving to the facebook's queue, and checking for front after checking for emptiness or size doesn't seem obvious to me.

            while (!queue.empty()) {
                lapper.put_sample(*queue.front());
                queue.pop();
            }

another one

           while (queue.size() > 4) {
                queue.pop();
            }

and

            while (queue.front()) {
                DBOUT("$" << queue.writeIdx_ << " " << queue.writeIdxCache_ << " " << queue.readIdx_ << " " << queue.readIdxCache_);
                queue.pop();
            }

Apparently, given there are debug outputs, it was the last one that triggered the assertion. Pushing was being done this way, in a separate thread:

               bool success = queue.try_push(workbuf);
                if (success) {
                    last_timestamp = buf.timestamp;
                }
rigtorp commented 1 year ago

Yes, those first two are invalid and will break the queue invariants.

I really should never have used the front() and pop() APi from std::queue. Instead they should be a single transaction try_pop() that can return a value. Or a zero-copy version that visits the value in-place using a lambda.

On Mon, Sep 25, 2023 at 5:43 AM Dmitry Galchinsky @.***> wrote:

I don't remember the details (even where the assertion got triggered), but here are 3 chunks of code I had before moving to the facebook's queue, and checking for front after checking for emptiness or size doesn't seem obvious to me.

        while (!queue.empty()) {
            lapper.put_sample(*queue.front());
            queue.pop();
        }

another one

       while (queue.size() > 4) {
            queue.pop();
        }

and

        while (queue.front()) {
            DBOUT("$" << queue.writeIdx_ << " " << queue.writeIdxCache_ << " " << queue.readIdx_ << " " << queue.readIdxCache_);
            queue.pop();
        }

Apparently, given there are debug outputs, it was the last one that triggered the assertion. Pushing was being done this way, in a separate thread:

           bool success = queue.try_push(workbuf);
            if (success) {
                last_timestamp = buf.timestamp;
            }

— Reply to this email directly, view it on GitHub https://github.com/rigtorp/SPSCQueue/issues/35#issuecomment-1733414452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABLO26TGO4YASZ6VM3UXY3X4FN55ANCNFSM5SFF52EQ . You are receiving this because you were mentioned.Message ID: @.***>