ldbc / ldbc_snb_interactive_v1_impls

Reference implementations for LDBC Social Network Benchmark's Interactive workload.
https://ldbcouncil.org/benchmarks/snb-interactive
Apache License 2.0
100 stars 86 forks source link

create-validation-parameters.sh:Error loading Workload class #390

Closed Ask-sola closed 1 year ago

Ask-sola commented 1 year ago

Hello, I am running the implementation of Postgre and have performed the following steps:

Firstly, I built the project: postgres/scripts/build.sh

Next, I built a dataset using Docker and obtained three folders for input, delete, and initial_ Snapshot, the dataset storage location is/home/hay010618/ldbc/test1/out sf1/graphs/csv/bi/composite-merged-fk: docker run --mount type=bind,source="$(pwd)/test1",target=/out ldbc/datagen-standalone:0.5.1-2.12_spark3.2 --parallelism 1 -- --format csv --scale-factor 1 --mode bi --format-options compression=gzip image

Next, I load the dataset: Export POSTGRES_ CSV_ DIR=~/ldbc/test1/out sf1/graphics/csv/bi/composite-merged-fkand run scripts/load in one step.sh,It runs smoothly until here。

However, when I run driver/create validation parameters. sh, it displays as Error loading Workload classimage I see this because there is no parquet file needed in the empty folder that needs to be read: image

I found this issue:https://github.com/ldbc/ldbc_snb_interactive_impls/issues/322 However, I still don't understand how to change 'ldbc. snb. interactive. updatesdir' and 'ldbc. snb. interactive. parameters dir ", will using the default parameters here result in the inability to run? Is the required parquet file obtained through the previous dataset generation or will it be generated through" driver/create validation parameters. sh "? If it is the former, I generated files in CSV mode and did not find the parameter folder for parquet. If it is the latter, did I not generate the required parquet files in the corresponding folder, which caused this problem to occur? This makes me very confused

Ask-sola commented 1 year ago

I saw in the document that parameters can be generated and attempted through the following methods: image And I saw two new folders: "parameters sf1" and "update streams sf1", but these two folders are still empty: image

GLaDAP commented 1 year ago

Hi!

To run the Interactive v2 benchmark, from the ldbc_snb_interactive_impls directory, first run scripts/install-dependencies.sh in the ldbc_snb_interactive_driver directory. This will install the required dependencies for the parameter generator later. (paramgen is part of the driver repository)

Next, to create the dataset, you can run the example as shown in your screenshot:

export SF=1 #The scale factor to generate
export LDBC_SNB_DATAGEN_DIR=~/repositories/ldbc_snb_datagen_spark # Path to the LDBC SNB datagen directory
export LDBC_SNB_DATAGEN_MAX_MEM=8G #Maximum memory the datagen could use, e.g. 16G
export LDBC_SNB_DRIVER_DIR=~/repositories/ldbc_snb_interactive_driver # Path to the LDBC SNB driver directory
export DATA_INPUT_TYPE=parquet
# If using the Docker Datagen version, set the env variable:
export USE_DATAGEN_DOCKER=true

scripts/generate-all.sh

Make sure to change the LDBC_SNB_DATAGEN_DIR and LDBC_SNB_DRIVER_DIR to point to the directories on your machine (or clone the repositories first)

Then, run the scripts/generate-all.sh script in the ldbc_snb_interactive_impls directory. This will create the folders in the directory. So, for example, when generating using SF=1, you will get two directories in the ldbc_snb_interactive_impls directory: update-streams-sf1, containing the update streams for the insert and deletes, and parameters-sf1, containing the substitution parameters. After generation, you can update the benchmark.properties file for Postgres (assuming using SF1 here):

ldbc.snb.interactive.scale_factor=1
ldbc.snb.interactive.updates_dir=~/repositories/ldbc_snb_interactive_impls/update-streams-sf1/ 
ldbc.snb.interactive.parameters_dir=~/repositories/ldbc_snb_interactive_impls/parameters-sf1/

Note to update the paths to where your copy of the ldbc_snb_interactive_impls is.

Ask-sola commented 1 year ago

Thank you very much for your answer, but when I ran 'scripts/generate all. sh', the following problem still occurred: image

I found the following steps by checking 'scripts/generate all. sh': image However, I originally thought that the "LDBC_SNB-DRIVER-DIR" folder was an empty folder I created myself and would automatically generate its contents through "scripts/generate all. sh". From here, it seems that it is not. So how should I make sure that this file contains the "scripts" I need? This step is stuck, resulting in the inability to generate 'update' and 'parameter'

GLaDAP commented 1 year ago

Hi,

First, you need to clone https://github.com/ldbc/ldbc_snb_interactive_driver, then you should point the LDBC_SNB_DRIVER_DIR that directory. Assuming you are cloning it in your home folder in a directory called ldbc, it would look like this:

export LDBC_SNB_DRIVER_DIR=~/ldbc/ldbc_snb_interactive_driver

When cloning that repository, you will find the required scripts.

Same for LDBC_SNB_DATAGEN_DIR; clone https://github.com/ldbc/ldbc_snb_datagen_spark and point to that directory:

export LDBC_SNB_DRIVER_DIR=~/ldbc/ldbc_snb_datagen_spark

Then using the previous commands, it should work. Note that in the ldbc_snb_interactive_driver there is a scripts/install-dependencies.sh you should run. Afterwards, navigate to the ldbc_snb_interactive_impls to execute the scripts/generate-all.sh

Ask-sola commented 1 year ago

Thank you very much for your reply. I have successfully completed create parameter so far