man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.51k stars 93 forks source link

[Possibly MacOS issue] read data by arcticdb crashes the jupyter kernel #523

Closed zzxfriend closed 1 year ago

zzxfriend commented 1 year ago

Describe the bug

when I run the read_batch on my mac to fetch data,It always crashes the jupyter kernel. I tried running it on terminal, It raises the error as shown in picture2. And I also tried reading data from AWS,it work well. so i guess it's a mac issue

image image

Steps/Code to Reproduce

ac = Arctic("lmdb:////Users/zhangmessi/Documents/Mycode/quantrade/arcdata")

def get_factors(
        cls,
        libname,
        symbols,
        universe=None,
        start_time=None,
        end_time=None,
        datelist=None,
        add_condi=None,
        columns=None,
        return_format=None,
        source="ac",
        datecol="datetime",
        codecol="instrument",
    ):
        if source == "ac":
            lib = ACData.ac.get_library(libname)

        else:
            lib = ACData.s3ac.get_library(libname)

        q = QueryBuilder()
        qs = []
        if universe is not None:
            if isinstance(universe, list) == False:
                universe = [universe]

            q1 = eval(f"q['{codecol}'].isin(universe)")
            qs.append(q1)

        if start_time is not None:
            start_time = pd.to_datetime(start_time)
            q5 = eval(f"q['{datecol}'] >= start_time")
            qs.append(q5)

        if end_time is not None:
            end_time = pd.to_datetime(end_time)
            q6 = eval(f"q['{datecol}'] <= end_time")
            qs.append(q6)

        if datelist is not None:
            datelist = [pd.to_datetime(x) for x in datelist]
            q7 = eval(f"q['{datecol}'].isin(datelist)")
            qs.append(q7)

        if add_condi is not None:
            if isinstance(add_condi, list) == False:
                add_condi = [add_condi]
            for condi in add_condi:
                q_add = eval(condi)
                qs.append(q_add)

        if len(qs) > 0:
            lens = len(qs)
            condis = []
            for i in range(lens):
                condis.append(f"qs[{i}]")
            condis = " & ".join(condis)
            q = QueryBuilder()
            q = q[eval(condis)]

        def merge_datas(x, datecol, codecol):
            data = x.data
            symbol = x.symbol
            data = data.set_index([datecol, codecol])
            data.columns = [symbol]
            return data

        def merge_dfs(df1, df2):
            return pd.merge(df1, df2, how="outer", left_index=True, right_index=True)

        if isinstance(symbols, list) == True:
            datas = lib.read_batch(symbols=symbols, query_builder=q)
            if return_format in ["df", "vbt"]:
                datas = [
                    merge_datas(x, datecol=datecol, codecol=codecol)
                    for x in datas
                    if len(x.data) > 0
                ]
                datas = functools.reduce(merge_dfs, datas)
                datas.index.names = ["datetime", "instrument"]
                datas = [
                    merge_datas(x, datecol=datecol, codecol=codecol)
                    for x in datas
                    if len(x.data) > 0
                ]
                datas = functools.reduce(merge_dfs, datas)
                datas.index.names = ["datetime", "instrument"]
                if return_format == "vbt":
                    datas = get_vbt_data(datas)

        else:
            datas = lib.read(symbol=symbols, query_builder=q, columns=columns)
            datas = datas.data
            cols = datas.columns
            indexcol = []
            if datecol in cols:
                indexcol.append(datecol)
            if codecol in cols:
                indexcol.append(codecol)
            if len(indexcol) > 0:
                datas = datas.set_index(indexcol)

        return datas

Expected Results

Kernal Crash:
09:19:15.976 [info] Kernel acknowledged execution of cell 8 @ 1688001555966
09:19:17.158 [error] Disposing session as kernel process died ExitCode: undefined, Reason: 
09:19:17.159 [info] Dispose Kernel process 59835.
09:19:17.159 [error] Raw kernel process exited code: undefined
09:19:17.160 [error] Error in waiting for cell to complete Error: Canceled future for execute_request message before replies were done
    at t.KernelShellFutureHandler.dispose (~/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:32375)
    at ~/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51427
    at Map.forEach (<anonymous>)
    at y._clearKernelState (~/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51412)
    at y.dispose (~/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:44894)
    at ~/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112498
    at ne (~/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:1586779)
    at cy.dispose (~/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112474)
    at uy.dispose (~/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:119757)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
09:19:17.161 [warn] Cell completed with errors {
  message: 'Canceled future for execute_request message before replies were done'
}
09:19:17.162 [info] End cell 8 execution @ 1688001557162, started @ 1688001555966, elapsed time = 1.196s
09:19:17.162 [warn] Cancel all remaining cells due to cancellation or failure in execution
09:19:17.227 [info] End cell 8 execution @ undefined, started @ undefined, elapsed time = 0s

If run in terminal,It raises:

python(58896,0x306df0000) malloc: Double free of object 0x7f8dde0579a0
python(58896,0x306df0000) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    58896 abort      /Users/zhangmessi/opt/anaconda3/envs/quant_env/bin/python 

OS, Python Version and ArcticDB Version

Python: 3.9.16 (main, May 15 2023, 18:51:40) [Clang 14.0.6 ] OS: macOS-10.16-x86_64-i386-64bit ArcticDB: 1.3.0

Backend storage used

LMDB

Additional Context

No response

mehertz commented 1 year ago

Hi zzxfriend, when you say on AWS you mean in a Linux container?

That would point to it being a MacOS issue, yes!

zzxfriend commented 1 year ago

Yes,I'm not sure if Windows will have a similar problem, but this crush problem occurs too often to be used normally. Is there any solution?

jamesmunro commented 1 year ago

@zzxfriend Hi. Are you able to retry this with the most recent ArcticDB? We've solved a number of LMDB issues.

poodlewars commented 1 year ago

I'm going to resolve this due to inactivity, and because we've made a lot of LMDB fixes that resolve very similar issues. Please feel free to re-open if this is still an issue with latest ArcticDB @zzxfriend - ideally with a repro that we can execute.