man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.52k stars 93 forks source link

Numpy array test occasionaly raises segfault #1186

Open vasil-pashov opened 11 months ago

vasil-pashov commented 11 months ago

I think we have to sort out the GIL handling properly. This test,

    @pytest.mark.parametrize('n', range(1000))
    def test_append_empty_arrays_to_column(self, lmdb_version_store, array_type, n):
        df = pd.DataFrame({"col1": [np.array([1, 2, 3]).astype(array_type)]})
        lmdb_version_store.write("test_append_to_colum_with_empty_array", df)
        df_to_append = pd.DataFrame({"col1": [np.array([])]})
        lmdb_version_store.append("test_append_to_colum_with_empty_array", df_to_append)
        df_out = lmdb_version_store.read("test_append_to_colum_with_empty_array")
        df_target = pd.concat([df, df_to_append], ignore_index=True)
        assert_frame_equal(df_target, df_out.data)

that you added segfaults occasionally (hence the 1000 runs) in various places in Cython, indicating corruption of Python's datastructures. I've seen the test segfault when trying to run Python imports, or allocate tuples, for instance.

We've also seen it fail in the CI.

_Originally posted by @poodlewars in https://github.com/man-group/ArcticDB/pull/819#discussion_r1434311648_

Duplicate of: #1176

alexowens90 commented 10 months ago

Correct fix is probably #269