ldbc / ldbc_snb_datagen_hadoop

The Hadoop-based variant of the SNB Datagen
https://ldbcouncil.org/benchmarks/snb
Apache License 2.0
13 stars 13 forks source link

Pre-generate LongDateFormatter variants #17

Closed szarnyasg closed 2 years ago

szarnyasg commented 2 years ago

Grab & extract Hadoop.

Set the HADOOP_HOME and HADOOP_CLIENT_OPTS environment variables in the ~/.bashrc or similar file.

Set a temp directory with ample free space:

$HADOOP_HOME/etc/hadoop/core-site.xml
<property>
  <name>hadoop.tmp.dir</name>
  <value>/path/to/dir</value>
</property>

Generate data:

#!/bin/bash

set -eu

rm -rf social_network/
rm -f datagen.log

export HADOOP_CLIENT_OPTS="-Xmx900G"

# set serializer to be one of:
# - CsvBasic
# - CsvComposite
# - CsvMergeForeign
# - CsvCompositeMergeForeign
SERIALIZER=CsvMergeForeign

for SF in 0.1 0.3 1 3 10 30 100 300 1000; do
    echo "=> SF: ${SF}" | tee -a datagen.log

    rm -rf /tmp/hadoop*

    echo > params.ini
    echo ldbc.snb.datagen.generator.scaleFactor:snb.interactive.${SF} >> params.ini

    # dateformat
    echo ldbc.snb.datagen.serializer.dateFormatter:ldbc.snb.datagen.util.formatter.LongDateFormatter >> params.ini

    # no update streams, no serializers
    echo ldbc.snb.datagen.parametergenerator.parameters:false >> params.ini
    echo ldbc.snb.datagen.serializer.updateStreams:false >> params.ini

    # serializers
    echo ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.${SERIALIZER}DynamicActivitySerializer >> params.ini
    echo ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.${SERIALIZER}DynamicPersonSerializer >> params.ini
    echo ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.${SERIALIZER}StaticSerializer >> params.ini

    ./run.sh
    cp params.ini social_network/

    mv social_network/ social_network-${SERIALIZER}-sf${SF}
done

Make sure the filenames are correct.

Compress with:

export ZSTD_NBTHREADS=`nproc`
tar --zstd -cf social_network-<...>-sf${SF}.tar.zst social_network-<...>-sf${SF}/
szarnyasg commented 2 years ago

Done & uploaded to the data repository's staging server.