ldbc / ldbc_snb_bi

Reference implementations for the LDBC Social Network Benchmark's Business Intelligence (BI) workload
https://ldbcouncil.org/benchmarks/snb-bi
Apache License 2.0
36 stars 22 forks source link

Tigergraph benchmarks fail with "KeyError" exception #67

Closed pgrabusz closed 2 years ago

pgrabusz commented 2 years ago

Error while running tigergraph/benchmark.py:

Traceback (most recent call last):
  File "benchmark.py", line 35, in <module>
    duration = run_batch_update(batch_date, args)
  File "/mnt/nvme0n1/pgrabusz/ldbc_snb_bi-main/tigergraph/batches.py", line 95, in run_batch_update
    result, duration = run_query(f'del_{vertex}', {'file':str(docker_path/fp.name), 'header':args.header}, args.endpoint)
  File "/mnt/nvme0n1/pgrabusz/ldbc_snb_bi-main/tigergraph/batches.py", line 35, in run_query
    return response['results'][0]['result'], duration
KeyError: 'results'

Setup: 3-node Tigergraph (bare metal/outside docker)

Steps to reproduce:

<> is a path on the main node (like /home/root/benchmarks) <> is the main node's IP address

All using root:

=================
--- DATA GEN: ---
=================

Download repo
    wget https://github.com/ldbc/ldbc_snb_datagen_spark/archive/refs/heads/main.zip
        commit hash: c1438ce36d9d7baa070978512965d4e043aaa123
    cd ldbc_snb_datagen_spark-main

/tools/build.sh
    mvn version: Apache Maven 3.6.3
    java version: openjdk 11.0.15 2022-04-19
                  OpenJDK Runtime Environment (build 11.0.15+10-Ubuntu-0ubuntu0.20.04.1)
                  OpenJDK 64-Bit Server VM (build 11.0.15+10-Ubuntu-0ubuntu0.20.04.1, mixed mode, sharing)

Install Python tools
    python3 -m virtualenv .venv
        python version: 3.8.10
    . .venv/bin/activate
    pip install -U pip
        pip 22.1 (python 3.8)
    pip install ./tools

If not for a 1st time:
    find $TG_DATA_DIR -name _SUCCESS -delete
    find $TG_DATA_DIR -name *.crc -delete

Run data gen
    export SPARK_HOME=<<BASE_PATH>>/spark-3.1.2-bin-hadoop3.2
    export PATH="$SPARK_HOME/bin":"$PATH"
    export PLATFORM_VERSION=2.12_spark3.1
    export DATAGEN_VERSION=0.5.0-SNAPSHOT
    export SF=1

    rm -rf out-sf${SF}/
    tools/run.py \
        --cores $(nproc) \
        --memory 8G \
        ./target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}.jar \
        -- \
        --format csv \
        --scale-factor ${SF} \
        --explode-edges \
        --mode bi \
        --output-dir out-sf${SF}/ \
        --generate-factors \
        # --format-options compression=gzip

generated data:
    generator runs for about 4 min for SF1
    <<BASE_PATH>>/ldbc_snb_datagen_spark-main/out-sf1

====================
--- BI PARAMGEN: ---
====================

repo as in bi load data
venv the same as above

install dependencies
    scripts/install-dependencies.sh does not work,
    installing manually: pip install duckdb==0.3.4 pytz

copy data to
    paramgen/factors and paramgen/temporal
    cp -r <<BASE_PATH>>/ldbc_snb_datagen_spark-main/out-sf1/factors/csv/raw/composite-merged-fk/* factors/
    cp -r <<BASE_PATH>>/ldbc_snb_datagen_spark-main/out-sf1/graphs/parquet/raw/composite-merged-fk/dynamic/{Person,Person_knows_Person,Person_studyAt_University,Person_workAt_Company} temporal/

run paramgen
    scripts/paramgen.sh

parameters generated to
    <<BASE_PATH>>/ldbc_snb_bi-main/parameters

=====================
--- BI LOAD DATA: ---
=====================

download repo
    wget https://github.com/ldbc/ldbc_snb_bi/archive/refs/heads/main.zip
        commit hash: 37e3a2ec30dd2e79fb9bbd9bb9a5e80c4ededf59
    cd ldbc_snb_bi-main/tigergraph

configure
    export TG_DATA_DIR=<<BASE_PATH>>/ldbc_snb_datagen_spark-main/out-sf1/graphs/csv/bi/composite-projected-fk/
    export TG_HEADER=true
    export SF=1
    export TG_VERSION=latest
    export TG_DDL_DIR=<<BASE_PATH>>/ldbc_snb_bi-main/tigergraph/ddl
    export TG_DML_DIR=<<BASE_PATH>>/ldbc_snb_bi-main/tigergraph/dml

    sed "s;header=\"false\";header=\"$TG_HEADER\";" $TG_DDL_DIR/load_static.gsql > $TG_DDL_DIR/load.gsql
    sed "s;header=\"false\";header=\"$TG_HEADER\";" $TG_DDL_DIR/load_dynamic.gsql >> $TG_DDL_DIR/load.gsql
    sed "s;header=\"false\";header=\"$TG_HEADER\";" $TG_DML_DIR/ins_Vertex.gsql >> $TG_DDL_DIR/load.gsql
    sed "s;header=\"false\";header=\"$TG_HEADER\";" $TG_DML_DIR/ins_Edge.gsql >> $TG_DDL_DIR/load.gsql
    sed "s;header=\"false\";header=\"$TG_HEADER\";" $TG_DML_DIR/del_Edge.gsql >> $TG_DDL_DIR/load.gsql

load data
    su tigergraph
    (run .venv)
    export all variables from above (repeat all export commands)

    ddl/setup.sh \
        <<BASE_PATH>>/ldbc_snb_datagen_spark-main/out-sf1/graphs/csv/bi/composite-projected-fk \
        <<BASE_PATH>>/ldbc_snb_bi-main/tigergraph/queries \
        <<BASE_PATH>>/ldbc_snb_bi-main/tigergraph/dml \

=========================
--- RUN BI BENCHMARK: ---
=========================

run in tigergraph path as tigergraph user:
    su tigergraph
    (run .venv)
    export all variables from above (repeat all export commands)

    export TG_PARAMETER=<<BASE_PATH>>/ldbc_snb_bi-main/parameters
    export TG_ENDPOINT=http://<<BASE_NODE>>:9000

    scripts/benchmark.sh

    error:

Traceback (most recent call last):
  File "benchmark.py", line 35, in <module>
    duration = run_batch_update(batch_date, args)
  File "<<BASE_PATH>>/ldbc_snb_bi-main/tigergraph/batches.py", line 95, in run_batch_update
    result, duration = run_query(f'del_{vertex}', {'file':str(docker_path/fp.name), 'header':args.header}, args.endpoint)
  File "<<BASE_PATH>>/ldbc_snb_bi-main/tigergraph/batches.py", line 35, in run_query
    return response['results'][0]['result'], duration
KeyError: 'results'

After a little investigation:

  1. scripts ignore some variables set in the environment and/or passed to the scripts/benchmark.sh (script vars.sh rewrites them)
  2. python scripts also lose the path while passing to methods
  3. run_query method (batches.py) fails - http://100.67.80.11:9000/query/ldbc_snb/del_Comment request returns {'version': {'edition': 'enterprise', 'api': 'v2', 'schema': 0}, 'error': True, 'message': "Runtime Error: File '/data/deletes/dynamic/Comment/batch_id=2012-11-29/part-00000-763d926e-20e7-4961-b04c-f510e70e9e80.c000.csv' does not exist."} so there's no results key. As the message points out - there is no such file (probably connected to the pathing issue described above).
szarnyasg commented 2 years ago

Thanks! @yczhang1017 can you please take a look?

yuchenZhangTG commented 2 years ago

@pgrabusz The error suggest the deletes data files do not exist on the server node 100.67.80.11. For cluster mode where TigerGraph is installed outside the docker, you need to add --cluster option for batches.sh and benchmark.sh. This option allows all the nodes to process the batch update files. Please refer to ./k8s/benchmark.sh or ./k8s/batches.sh script to run benchmark on clusters.