Closed petere closed 1 year ago
Hi @petere, please edit the driver/create-validation-parameters.properties
file and set the ldbc.snb.interactive.updates_dir
to point to the directory with the SF1 update streams (updateStream_0_0_forum.csv
and updateStream_0_0_person.csv
) produced by the Datagen. Also ensure that ldbc.snb.interactive.parameters_dir
points to the parameters used by SF1 – this is also produced by the Datagen and placed in the substitution_parameters
directory.
I'll clarify this in the README(s).
Ok, after I made the corresponding changes in postgres/driver/create-validation-parameters.properties
, postgres/driver/validate.properties
, and postgres/driver/benchmark.properties
, I can run all three steps against my generated data.
Excellent! For the larger data sets, the generator can be rather slow – consider using the pre-generated ones. They are linked at https://github.com/ldbc/data-sets-surf-repository#snb-interactive-v1-csvcompositemergeforeign-serializer-using-stringdateformatter
I'm trying this out with default settings.
I used
ldbc_snb_datagen_hadoop
main branch to generate the data. I didcp params-csv-merge-foreign.ini params.ini
, left everything else the same, then ran the docker command from the README to produce the data. That worked.Then I'm in this repository. I am using tag
1.0.0
. I ranscripts/build.sh
and installed psycopg2 as instructed. I setPOSTGRES_CSV_DIR
as appropriate. Then ranscripts/load-in-one-step.sh
successfully. Then I randriver/create-validation-parameters.sh
, which failed very quickly likeNow, if I use
POSTGRES_CSV_DIR=
pwd/test-data/
instead, it all works. So is there some kind of incompatibility with how I'm using the Hadoop-based data generation?