Closed szarnyasg closed 2 years ago
Grab & extract Hadoop.
Set the HADOOP_HOME and HADOOP_CLIENT_OPTS environment variables in the ~/.bashrc or similar file.
HADOOP_HOME
HADOOP_CLIENT_OPTS
~/.bashrc
Set a temp directory with ample free space:
$HADOOP_HOME/etc/hadoop/core-site.xml
<property> <name>hadoop.tmp.dir</name> <value>/path/to/dir</value> </property>
Generate data:
#!/bin/bash set -eu rm -rf social_network/ rm -f datagen.log export HADOOP_CLIENT_OPTS="-Xmx900G" # set serializer to be one of: # - CsvBasic # - CsvComposite # - CsvMergeForeign # - CsvCompositeMergeForeign SERIALIZER=CsvMergeForeign for SF in 0.1 0.3 1 3 10 30 100 300 1000; do echo "=> SF: ${SF}" | tee -a datagen.log rm -rf /tmp/hadoop* echo > params.ini echo ldbc.snb.datagen.generator.scaleFactor:snb.interactive.${SF} >> params.ini # dateformat echo ldbc.snb.datagen.serializer.dateFormatter:ldbc.snb.datagen.util.formatter.LongDateFormatter >> params.ini # no update streams, no serializers echo ldbc.snb.datagen.parametergenerator.parameters:false >> params.ini echo ldbc.snb.datagen.serializer.updateStreams:false >> params.ini # serializers echo ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.${SERIALIZER}DynamicActivitySerializer >> params.ini echo ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.${SERIALIZER}DynamicPersonSerializer >> params.ini echo ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.${SERIALIZER}StaticSerializer >> params.ini ./run.sh cp params.ini social_network/ mv social_network/ social_network-${SERIALIZER}-sf${SF} done
Make sure the filenames are correct.
Compress with:
export ZSTD_NBTHREADS=`nproc` tar --zstd -cf social_network-<...>-sf${SF}.tar.zst social_network-<...>-sf${SF}/
Done & uploaded to the data repository's staging server.
Grab & extract Hadoop.
Set the
HADOOP_HOME
andHADOOP_CLIENT_OPTS
environment variables in the~/.bashrc
or similar file.Set a temp directory with ample free space:
Generate data:
Make sure the filenames are correct.
Compress with: