ketiltrout / getdata

The GetData Project is the reference implementation of the Dirfile Standards, a filesystem-based, column-oriented database format for time-ordered binary data.
http://getdata.sourceforge.net/
GNU Lesser General Public License v2.1
4 stars 7 forks source link

SIE encoding fail in threaded reads #18

Open merny93 opened 1 week ago

merny93 commented 1 week ago

It would appear that SIE encoded fields can not be read safely in multiple threads. I at first came across this problem through the python bindings but I have reproduced it in C. Using the following code:

#include <getdata.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void* read_data(void* arg) {
    DIRFILE* df = (DIRFILE*)arg;
    double* data = (double*)malloc(1000 * sizeof(double));
    int res;
    for (int i = 0; i < 100; i++)
    {
        res = gd_getdata(df, "test", 0, 0, 1000, 0, GD_FLOAT64, data);
    }

    printf("res = %d\n", res);
    free(data);
    return NULL;
}

int main(int argc, char **argv)
{
    DIRFILE* df = gd_open("data", GD_RDONLY);
    if (df == NULL) {
        fprintf(stderr, "Failed to open dirfile\n");
        return 1;
    }

    pthread_t thread1, thread2;

    pthread_create(&thread1, NULL, read_data, (void*)df);
    pthread_create(&thread2, NULL, read_data, (void*)df);

    pthread_join(thread1, NULL);
    pthread_join(thread2, NULL);

    gd_close(df);
    return 0;
}

and a dirfile written by the python bindings with the following format file:

# This is a dirfile format file.
# It was written using version 0.11.0 of the GetData Library.
# Written on Tue Nov  5 15:46:14 2024 UTC by simon.

/VERSION 10
/ENDIAN little
/PROTECT none
/ENCODING sie
test RAW FLOAT64 1
/REFERENCE test

I consistently get segfaults in the fread in sie.c _GD_Advance (or double-free errors at free(databuffer) in _GD_DoRaw).

These errors do not show up with unencoded fields so I assume its something to do with SIE although I only had a very cursory look inside.