Memory free/access errors with concurrent independent dataframes.

mdumitrean commented 4 months ago

I am creating 11000 DataFrames, the dataframes are populated by a threadpool of 24 threads and stored in a dictionary.

I am finding sigsegv or abort deep inside DataFrame. I don't understand what's going on here, would really appreciate it if you can help me out.

DataFrameConcurrencyTests.txt

hosseinmoein commented 4 months ago

First, I am not sure what you are doing with your thread-pool. DataFrame was meant to work only with its own thread-pool. Second, I don't know what kind of machine/computer you have. But you are creating 264k threads. You must have a NASA size machine to tolerate that many threads :-)

Please read the DataFrame documentation from beginning to end, emphases on the multithreading section -- it is not that long.

After that if you still have a problem, send me the stack trace. I might be able to give you hint.

mdumitrean commented 4 months ago

I did read the documentation. I must have missed something or am not understanding something. These are 24 independent threads created by my threadpool. When I am debugging this I see only 24 worker threads that I’m launching, but I am creating 11k individual dataframes; I have not enabled multithreaded behavior inside any dataframe object. The threads represent database connections reading data (24 connections at once).The back trace is long and deep inside dataframe. The code should compile and crash. Confirmed with g++-14 and g++-13. Best, Marius

mdumitrean commented 4 months ago

dataframe_sigsegv.txt

I've attached backtrace and 24 threads being created.

ThreadPool class is included for convenience and widely used from here: https://github.com/progschj/ThreadPool

hosseinmoein commented 4 months ago

Ok, I suggest you read the multithreading section in the documentation again. You are using the DataFrame in a multithreaded environment. You must call set_lock() and provide a spin lock for DataFrame. Read docs + look at code samples provided.

mdumitrean commented 4 months ago

Dear Hossein, I guess I didn't provide the spin lock version of the code for the example, I had read and tried adding a spin lock in the code. set_lock with a spinlock around the dataload into the dataframe didn't help at all.

Best, Marius

hosseinmoein commented 4 months ago

What I can surmise from the stack trace you provided is that the threads are overwriting themselves in the static members of hetero vectors.

hosseinmoein commented 4 months ago

maybe provide a snippet of the code where you set the lock and how you define the lock

mdumitrean commented 4 months ago

class MinuteObservationFetch {
private:
    hmdf::SpinLock spinLock;
public:
    void createRandom(string t) {
            ....
            hmdf::StdDataFrame<uint64_t>::set_lock(&spinLock);
            MinuteObservationFetchTicker ticker(std::move(t));
            ticker.loadData(std::move(id_vec), std::move(name_vec), std::move(unix_vec), std::move(date_vec), std::move(symbol_vec),
                            std::move(open_vec), std::move(high_vec), std::move(low_vec), std::move(close_vec), std::move(volume_vec),
                            std::move(volume_usd_vec));
            insertInMap(t, std::move(ticker));
            hmdf::StdDataFrame<uint64_t>::remove_lock();
            ....

Makes no difference. Tried many variations.

mdumitrean commented 4 months ago

Oh wow.

I think I got it. It's finally no longer dying. The lock has to be not around the actual creation of each dataframe and write to dataframe, individually, but encompassing the entire thread creation.

    void createRandom(string t) {
            vector<uint64_t> id_vec;
            vector<string> name_vec;
            vector<uint64_t> unix_vec;
            vector<string> date_vec;
            vector<string> symbol_vec;
            vector<double> open_vec;
            vector<double> high_vec;
            vector<double> low_vec;
            vector<double> close_vec;
            vector<double> volume_vec;
            vector<double> volume_usd_vec;
            for (int j = 0; j < 20000; j++) {
                id_vec.push_back(j);
                name_vec.emplace_back("NAME" + to_string(j));
                unix_vec.push_back(j);
                date_vec.emplace_back("DATE" + to_string(j));
                symbol_vec.emplace_back("SYMBOL" + to_string(j));
                open_vec.push_back(j);
                high_vec.push_back(j);
                low_vec.push_back(j);
                close_vec.push_back(j);
                volume_vec.push_back(j);
                volume_usd_vec.push_back(j);
            }

            MinuteObservationFetchTicker ticker(std::move(t));
            ticker.loadData(std::move(id_vec), std::move(name_vec), std::move(unix_vec), std::move(date_vec), std::move(symbol_vec),
                            std::move(open_vec), std::move(high_vec), std::move(low_vec), std::move(close_vec), std::move(volume_vec),
                            std::move(volume_usd_vec));
            insertInMap(t, std::move(ticker));
    }

void test1() {
    cout << "Test1" << endl;
    MinuteObservationFetch fetch;
    hmdf::SpinLock spinLock;
    hmdf::StdDataFrame<uint64_t>::set_lock(&spinLock);
    fetch.populate(128);
    hmdf::StdDataFrame<uint64_t>::remove_lock();

    cout << "Test1 done" << endl;
}

mdumitrean commented 4 months ago

I spoke too soon. The dataframe still eventually corrupted (just a lot less often); I have gone ahead and just used plain vectors and have no more issues. I wish I could have used the DataFrame library, but it's just too sensitive and I couldn't figure out how to get it to work right in high performance many thread environment (128 threads).

hosseinmoein commented 4 months ago

If you happen to have one of those stack traces, please post them here

hosseinmoein / DataFrame

Memory free/access errors with concurrent independent dataframes. #312