ldbc / ldbc_snb_datagen_hadoop

The Hadoop-based variant of the SNB Datagen
https://ldbcouncil.org/benchmarks/snb
Apache License 2.0
13 stars 13 forks source link

Error generating an SNB dataset with a custom scale factor #5

Closed deslay1 closed 2 years ago

deslay1 commented 3 years ago

Hi! I tried to follow the instructions to generate a dataset with a different scale factor (250) instead of 1. I copied the params-csv-composite.ini into params.ini and tried to follow https://github.com/ldbc/ldbc_snb_datagen_hadoop#pseudo-distributed-hadoop-node. I changed to HADOOP_CLIENT_OPTS="-Xmx100G".

Running the bash script gives me the following:

[INFO] ------------------------------------------------------------------------
[INFO] Total time:  33.642 s
[INFO] Finished at: 2021-09-28T19:08:17Z
[INFO] ------------------------------------------------------------------------
Reading scale factors..
Available scale factor configuration set snb.interactive.0.1
Available scale factor configuration set snb.interactive.0.3
Available scale factor configuration set snb.interactive.1
Available scale factor configuration set snb.interactive.3
Available scale factor configuration set snb.interactive.10
Available scale factor configuration set snb.interactive.30
Available scale factor configuration set snb.interactive.100
Available scale factor configuration set snb.interactive.300
Available scale factor configuration set snb.interactive.1000
Available scale factor configuration set graphalytics.1
Available scale factor configuration set graphalytics.3
Available scale factor configuration set graphalytics.10
Available scale factor configuration set graphalytics.30
Available scale factor configuration set graphalytics.100
Available scale factor configuration set graphalytics.300
Available scale factor configuration set graphalytics.1000
Available scale factor configuration set graphalytics.3000
Available scale factor configuration set graphalytics.10000
Available scale factor configuration set graphalytics.30000
Number of scale factors read 19
Applied configuration of scale factor snb.interactive.250
null
Error during execution
java.lang.NullPointerException
Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException
        at ldbc.snb.datagen.util.ConfigParser.readConfig(ConfigParser.java:165)
        at ldbc.snb.datagen.util.ConfigParser.readConfig(ConfigParser.java:133)
        at ldbc.snb.datagen.LdbcDatagen.main(LdbcDatagen.java:341)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NullPointerException
        at ldbc.snb.datagen.util.ConfigParser.readConfig(ConfigParser.java:148)
        ... 8 more

Somehow it seems like there is a problem parsing the parameters file. Anyone knows something about this that can help?

szarnyasg commented 3 years ago

Hi @deslay1, the built-in SFs are 0.1, 0.3, 1, 3, 10, etc. To generate a custom SF such as 250, you have to edit the scale factors file and figure out the numPersons: https://github.com/ldbc/ldbc_snb_datagen_hadoop/blob/main/src/main/resources/scale_factors.xml Gabor

deslay1 commented 3 years ago

Thanks! I will look at the file

deslay1 commented 3 years ago

To see if the output was properly generated, I want to see the generated csv files. However, I cannot find the output directory? I am running with default environment variables where it is possible and using option 1: https://github.com/ldbc/ldbc_snb_datagen_hadoop#pseudo-distributed-hadoop-node.

I looked through the wiki but could not find any information on this. Any idea on this? There is a target/ directory but I think that is something else.

szarnyasg commented 3 years ago

Did you use any of the run.sh or docker_run.sh files?

deslay1 commented 3 years ago

Yeah exactly, I ran ./run.sh. But I think I found the problem. I don't have enough memory on my root drive. I have an attached drive on the path /extradata. I saw the troubleshooting section about this: https://github.com/ldbc/ldbc_snb_datagen_hadoop/wiki/Troubleshooting#javaioioexception-no-space-left-on-device So I configured the core-site.xml like this:

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/extradata</value>
  </property>
</configuration>

But now I'm getting the error:

************************************************
* Starting: Person generation *
************************************************
2021-09-29 16:07:03,768 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-09-29 16:07:04,057 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-09-29 16:07:04,057 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2021-09-29 16:07:04,155 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2021-09-29 16:07:04,481 INFO input.FileInputFormat: Total input files to process : 1
2021-09-29 16:07:04,498 INFO mapreduce.JobSubmitter: number of splits:1
2021-09-29 16:07:04,651 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local481773069_0001
2021-09-29 16:07:04,652 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-09-29 16:07:04,689 WARN conf.Configuration: Could not make localRunner/ in local directories from mapreduce.cluster.local.dir
2021-09-29 16:07:04,689 WARN conf.Configuration: mapreduce.cluster.local.dir[0]=/extradata/mapred/local
2021-09-29 16:07:04,689 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop/mapred/staging/tuner481773069/.staging/job_local481773069_0001
Error during execution
No valid local directories in property: mapreduce.cluster.local.dir
Exception in thread "main" java.io.IOException: No valid local directories in property: mapreduce.cluster.local.dir
        at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:2745)
        at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:585)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:166)
        at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:794)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:251)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
        at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
        at ldbc.snb.datagen.hadoop.generator.HadoopPersonGenerator.run(HadoopPersonGenerator.java:174)
        at ldbc.snb.datagen.LdbcDatagen.runGenerateJob(LdbcDatagen.java:100)
        at ldbc.snb.datagen.LdbcDatagen.main(LdbcDatagen.java:347)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

Any idea on why this occurs? It complains about no valid local directories.

szarnyasg commented 3 years ago

According to the documentation ( https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml), the default value of mapreduce.cluster.local.dir is ${hadoop.tmp.dir}/mapred/local. So maybe you don't have permissions to write /extradata? (Fixing this would need something like sudo chown -R $USER:$USER /extradata).

Also, may I ask what your goal is with Datagen? Using the Hadoop-based generator is only recommended if you're interested in running the SNB Interactive workload. If you're interested in running SNB BI or just need an SNB data set, I'd recommend using the Spark-based Datagen hosted under https://github.com/ldbc/ldbc_snb_datagen_spark.

On Wed, Sep 29, 2021 at 6:13 PM Osama Eldawebi @.***> wrote:

Yeah exactly, I ran ./run.sh. But I think I found the problem. I don't have enough memory on my root drive. I have an attached drive on the path /extradata. I saw the troubleshooting section about this: https://github.com/ldbc/ldbc_snb_datagen_hadoop/wiki/Troubleshooting#javaioioexception-no-space-left-on-device So I configured the core-site.xml like this:

hadoop.tmp.dir /extradata

But now I'm getting the error:


  • Starting: Person generation *

    2021-09-29 16:07:03,768 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties 2021-09-29 16:07:04,057 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2021-09-29 16:07:04,057 INFO impl.MetricsSystemImpl: JobTracker metrics system started 2021-09-29 16:07:04,155 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2021-09-29 16:07:04,481 INFO input.FileInputFormat: Total input files to process : 1 2021-09-29 16:07:04,498 INFO mapreduce.JobSubmitter: number of splits:1 2021-09-29 16:07:04,651 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local481773069_0001 2021-09-29 16:07:04,652 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2021-09-29 16:07:04,689 WARN conf.Configuration: Could not make localRunner/ in local directories from mapreduce.cluster.local.dir 2021-09-29 16:07:04,689 WARN conf.Configuration: mapreduce.cluster.local.dir[0]=/extradata/mapred/local 2021-09-29 16:07:04,689 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop/mapred/staging/tuner481773069/.staging/job_local481773069_0001 Error during execution No valid local directories in property: mapreduce.cluster.local.dir Exception in thread "main" java.io.IOException: No valid local directories in property: mapreduce.cluster.local.dir at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:2745) at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:585) at org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:166) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:794) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:251) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588) at ldbc.snb.datagen.hadoop.generator.HadoopPersonGenerator.run(HadoopPersonGenerator.java:174) at ldbc.snb.datagen.LdbcDatagen.runGenerateJob(LdbcDatagen.java:100) at ldbc.snb.datagen.LdbcDatagen.main(LdbcDatagen.java:347) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

Any idea on why this occurs? It complains about no valid local directories.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ldbc/ldbc_snb_datagen_hadoop/issues/5#issuecomment-930324851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKWPMMVTAXA25GO6AXL3D3UEM3LFANCNFSM5E6EXMPA .

deslay1 commented 3 years ago

Thanks, you were right that I didn't have permissions. I tried the predefined scale factor 100 and I think I was able to to fully run the script afterwards. It produced csv files in social_netowork directory but I only see a dynamic subdirectory and not a static part. Is it because of how I configured it in the params.ini file? Here is how it looks:

ldbc.snb.datagen.generator.scaleFactor:snb.interactive.100

ldbc.snb.datagen.serializer.dynamicActivitySerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.activity.CsvCompositeDynamicActivitySerializer
ldbc.snb.datagen.serializer.dynamicPersonSerializer:ldbc.snb.datagen.serializer.snb.csv.dynamicserializer.person.CsvCompositeDynamicPersonSerializer
ldbc.snb.datagen.serializer.staticSerializer:ldbc.snb.datagen.serializer.snb.csv.staticserializer.CsvCompositeStaticSerializer

ldbc.snb.datagen.generator.numThreads:8

but I want to create data that is similar to the test-data in the SNB Interactive: https://github.com/ldbc/ldbc_snb_interactive/tree/main/cypher

Is this possible? Am I missing a serializer that will create the organisation, place and other entities in the data? How can I know what they are? I have looked at the specification, especially at section 3.4.2 that is most relevant: http://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf

szarnyasg commented 3 years ago

Your configuration seems correct. Are you sure the execution was successful and it didn't crash with an out-of-memory or out-of-disk-space error?

A remark about your configuration: unfortunately, numThreads won't do much unless you run on a Hadoop cluster, so there's no point setting it to anything other than the default value (1).

PS:

Is this possible? Am I missing a serializer that will create the organisation, place and other entities in the data? How can I know what they are? I have looked at the specification, especially at section 3.4.2 that is most relevant: http://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf

This is certainly possible - you're on the right path to generate the data set for SNB Interactive.

deslay1 commented 3 years ago

I didn't get any raised error but you're right the execution could not have been successful...I ran the command in the ./run.sh file: $HADOOP_HOME/bin/hadoop jar $LDBC_SNB_DATAGEN_HOME/target/ldbc_snb_datagen-0.3.5-jar-with-dependencies.jar $LDBC_SNB_DATAGEN_HOME/params.ini and it raised an error. It is there a way to choose location of the output social_network directory? I think it may have to do with not enough space on the root disk.

Also you're right, the numThreads property can only be utilized properly if we are running on a multiple node cluster as you write in the configuration page.

szarnyasg commented 3 years ago

You can use

ldbc.snb.datagen.serializer.outputDir:/path/to/output/dir

for setting the path.

deslay1 commented 3 years ago

Thanks, that works well. I get the same issue as before where I only get a dynamic part. I don't get any error but I do see that command in the ./run.sh file gets Killed. I don't know the cause of this. It looks like this:

...
starting generation of block: 0
2021-10-01 09:36:32,191 INFO mapred.LocalJobRunner: Generating activity of person 0 of block0 > reduce
2021-10-01 09:36:32,413 INFO mapreduce.Job:  map 100% reduce 67%
2021-10-01 09:36:38,192 INFO mapred.LocalJobRunner: Generating activity of person 0 of block0 > reduce
2021-10-01 09:36:56,193 INFO mapred.LocalJobRunner: Generating activity of person 1000 of block0 > reduce
2021-10-01 09:37:08,194 INFO mapred.LocalJobRunner: Generating activity of person 3000 of block0 > reduce
2021-10-01 09:37:14,195 INFO mapred.LocalJobRunner: Generating activity of person 4000 of block0 > reduce
2021-10-01 09:37:20,196 INFO mapred.LocalJobRunner: Generating activity of person 5000 of block0 > reduce
2021-10-01 09:37:26,196 INFO mapred.LocalJobRunner: Generating activity of person 5000 of block0 > reduce
./run.sh: line 42: 12374 Killed                  $HADOOP_HOME/bin/hadoop jar $LDBC_SNB_DATAGEN_HOME/target/ldbc_snb_datagen-0.3.5-jar-with-dependencies.jar $LDBC_SNB_DATAGEN_HOME/params.ini
szarnyasg commented 3 years ago

How much memory did you give Hadoop/the JVM? SF100 needs a fair bit of memory, so set e.g.:

export HADOOP_CLIENT_OPTS="-Xmx120G"
deslay1 commented 3 years ago

I dedicated exactly that much memory. This is how the run file looks like:

#!/bin/bash

if [ ! -f params.ini ]; then
  echo "Parameters file (params.ini) not found."
  exit 1
fi

DEFAULT_HADOOP_HOME=`pwd`/hadoop-3.2.1
DEFAULT_LDBC_SNB_DATAGEN_HOME=`pwd`
DEFAULT_HADOOP_CLIENT_OPTS="-Xmx120G"

# allow overriding configuration from outside via environment variables
# i.e. you can do
#     HADOOP_HOME=/foo/bar LDBC_SNB_DATAGEN_HOME=/baz/quux ./run.sh
# instead of changing the contents of this file
HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME}
HADOOP_CLIENT_OPTS=${HADOOP_CLIENT_OPTS:-$DEFAULT_HADOOP_CLIENT_OPTS}
LDBC_SNB_DATAGEN_HOME=${LDBC_SNB_DATAGEN_HOME:-$DEFAULT_LDBC_SNB_DATAGEN_HOME}
JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/

export HADOOP_HOME
export LDBC_SNB_DATAGEN_HOME
export HADOOP_CLIENT_OPTS
export JAVA_HOME

echo ===============================================================================
echo Running generator with the following parameters:
echo -------------------------------------------------------------------------------
echo LDBC_SNB_DATAGEN_HOME: $LDBC_SNB_DATAGEN_HOME
echo JAVA_HOME: $JAVA_HOME
echo HADOOP_HOME: $HADOOP_HOME
echo HADOOP_CLIENT_OPTS: $HADOOP_CLIENT_OPTS
echo ===============================================================================

mvn clean
mvn -DskipTests assembly:assembly

if [ "$(uname)" == "Darwin" ]; then
  zip -d $LDBC_SNB_DATAGEN_HOME/target/ldbc_snb_datagen-0.3.5-jar-with-dependencies.jar META-INF/LICENSE
fi

$HADOOP_HOME/bin/hadoop jar $LDBC_SNB_DATAGEN_HOME/target/ldbc_snb_datagen-0.3.5-jar-with-dependencies.jar $LDBC_SNB_DATAGEN_HOME/params.ini

rm -f m*personFactors*
rm -f .m*personFactors*
rm -f m*activityFactors*
rm -f .m*activityFactors*
rm -f m0friendList*
rm -f .m0friendList*
deslay1 commented 3 years ago

Is the parameters file for generating the test-data in the interactive benchmark repository available somewhere? Maybe it is helpful if I could attempt to replicate it first!

szarnyasg commented 3 years ago

@deslay1 the configuration is based on this: https://github.com/ldbc/ldbc_snb_interactive/tree/main#generating-small-test-data-tests

szarnyasg commented 3 years ago

By the way, getting a process killed could be a sign of the operating system terminating it due to memory exhausting (e.g. if you have exactly 120GB RAM without swap) or some out-of-memory tool (oom_reaper or earlyoom) preventively terminating it (e.g. if you have 128GB RAM and the tool is configured to keep at least 5% memory free).

deslay1 commented 3 years ago

Thank you, I think you were right about the RAM. I had a much lower RAM than I configured. I think I solved the problem. I was able to produce both the dynamic and static parts and in the interactive repository was able to import the data successfuly using the load-in-one-step.sh script. I tried to run the benchmark script using the driver (driver/benchmark.sh) but I received the following error. I can't make out what exactly the problem is. The errors are a hard to understand, other than of course that it wasn't able to execute the workload. Do you know anything about this? Here is the error message:

...
ExecuteWorkloadMode  
 --------------------
 --- Warmup Phase ---
 --------------------
ExecuteWorkloadMode  Scanning workload streams to calculate their limits...
WorkloadStreams  Scanned 0 of 0 - OFFSET
WorkloadStreams  Scanned 105 of 100 - RUN
ExecuteWorkloadMode  Loaded workload: com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbInteractiveWorkload
ExecuteWorkloadMode  Retrieving workload stream: LdbcSnbInteractiveWorkload
Oct 04, 2021 7:24:56 PM org.neo4j.driver.internal.logging.JULogger info
INFO: Direct driver instance 1623009085 created for server address localhost:7687
ExecuteWorkloadMode  Loaded DB: com.ldbc.impls.workloads.ldbc.snb.cypher.interactive.CypherInteractiveDb
ExecuteWorkloadMode  Instantiating WorkloadRunner
WorkloadStatusThread  2021/10/04 19:24:57 +0000 Runtime [00:00.000 (m:s.ms)], Operations [0], Last [00:00.000 (m:s.ms)], Throughput (Total) [0.00] (Last 0s) [0.00]
WorkloadStatusThread  2021/10/04 19:24:58 +0000 Runtime [00:01.100 (m:s.ms)], Operations [1], Last [00:00.023 (m:s.ms)], Throughput (Total) [0.91] (Last 1s) [0.91]
Shutting down status thread...
ExecuteWorkloadMode  Shutting down workload...
Client  Client terminated unexpectedly
com.ldbc.driver.ClientException: Error running workload
        at com.ldbc.driver.client.ExecuteWorkloadMode.doExecute(ExecuteWorkloadMode.java:387)
        at com.ldbc.driver.client.ExecuteWorkloadMode.startExecutionAndAwaitCompletion(ExecuteWorkloadMode.java:106)
        at com.ldbc.driver.Client.main(Client.java:53)
Caused by: com.ldbc.driver.ClientException: Error running workload

- Start Error Log -
        SOURCE: OperationHandlerRunnableContext [147] (Thread: ID=19, Name=ThreadPoolOperationExecutor-id(1633375497088)-thread(0), Priority=5)
        ERROR:  Operation result is null
Operation: LdbcQuery13{person1Id=8796093022357, person2Id=8796093022390}
        SOURCE: WorkloadRunnerThread [511] (Thread: ID=15, Name=Thread-0, Priority=5)
        ERROR:  Encountered error while waiting for asynchronous executor to shutdown
Handlers still running: 11
com.ldbc.driver.runtime.executor.OperationExecutorException: Error encountered while trying to shutdown
        at com.ldbc.driver.runtime.executor.ThreadPoolOperationExecutor.shutdown(ThreadPoolOperationExecutor.java:131)
        at com.ldbc.driver.runtime.WorkloadRunner$WorkloadRunnerThread.shutdownEverything(WorkloadRunner.java:507)
        at com.ldbc.driver.runtime.WorkloadRunner$WorkloadRunnerThread.run(WorkloadRunner.java:416)
Caused by: com.ldbc.driver.runtime.executor.OperationExecutorException: ThreadPoolOperationExecutor shutdown before all handlers could complete
10 handlers were queued for execution but not yet started
1 handlers were mid-execution
        at com.ldbc.driver.runtime.executor.ThreadPoolOperationExecutor.shutdown(ThreadPoolOperationExecutor.java:125)
        ... 2 more

        SOURCE: OperationHandlerRunnableContext [167] (Thread: ID=19, Name=ThreadPoolOperationExecutor-id(1633375497088)-thread(0), Priority=5)
        ERROR:  Error encountered
LdbcQuery1{personId=4398046511333, firstName='Jose', limit=20}
org.neo4j.driver.exceptions.ServiceUnavailableException: Connection to the database terminated. Thread interrupted while waiting for result to arrive
        at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:143)
        at org.neo4j.driver.internal.InternalResult.blockingGet(InternalResult.java:128)
        at org.neo4j.driver.internal.InternalResult.hasNext(InternalResult.java:64)
        at com.ldbc.impls.workloads.ldbc.snb.cypher.operationhandlers.CypherListOperationHandler.executeOperation(CypherListOperationHandler.java:30)
        at com.ldbc.impls.workloads.ldbc.snb.cypher.operationhandlers.CypherListOperationHandler.executeOperation(CypherListOperationHandler.java:16)
        at com.ldbc.driver.OperationHandlerRunnableContext.run(OperationHandlerRunnableContext.java:142)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
        Suppressed: org.neo4j.driver.internal.util.ErrorUtil$InternalExceptionCause
                at org.neo4j.driver.internal.util.ErrorUtil.newConnectionTerminatedError(ErrorUtil.java:48)
                at org.neo4j.driver.internal.async.inbound.ChannelErrorHandler.channelInactive(ChannelErrorHandler.java:71)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
                at org.neo4j.driver.internal.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:389)
                at org.neo4j.driver.internal.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
                at org.neo4j.driver.internal.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:389)
                at org.neo4j.driver.internal.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
                at org.neo4j.driver.internal.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
                at org.neo4j.driver.internal.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
                at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:831)
                at org.neo4j.driver.internal.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
                at org.neo4j.driver.internal.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
                at org.neo4j.driver.internal.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
                at org.neo4j.driver.internal.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
                at org.neo4j.driver.internal.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
                at org.neo4j.driver.internal.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
                ... 1 more

        SOURCE: ThreadPoolExecutorWithAfterExecute [220] (Thread: ID=19, Name=ThreadPoolOperationExecutor-id(1633375497088)-thread(0), Priority=5)
        ERROR:  Error retrieving handler
java.lang.NullPointerException
        at com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbShortReadGenerator$ResultBufferReplenishFun.replenish(LdbcSnbShortReadGenerator.java:538)
        at com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbShortReadGenerator.nextOperation(LdbcSnbShortReadGenerator.java:484)
        at com.ldbc.driver.runtime.executor.ChildOperationExecutor.execute(ChildOperationExecutor.java:30)
        at com.ldbc.driver.runtime.executor.ThreadPoolOperationExecutor$ThreadPoolExecutorWithAfterExecute.afterExecute(ThreadPoolOperationExecutor.java:209)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1129)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

- End Error Log -

        at com.ldbc.driver.client.ExecuteWorkloadMode.doExecute(ExecuteWorkloadMode.java:382)
        ... 2 more
szarnyasg commented 3 years ago

The error occurs with SF100, right? At that scale, you need to tinker with Neo4j's settings as well. I have not yet tried using Neo4j for SFs larger than 10 -- it may take a significant tuning / query optimization to get the workload, especially the path queries (queries 13 and 14) running.

In fact, it would be best to rewrite queries 13 and 14 to use the Graph Data Science Library: https://github.com/ldbc/ldbc_snb_interactive/issues/171

deslay1 commented 3 years ago

I actually used SF30 instead of 100 just to save some time during some initial tests. Perhaps you are right, I don't know so much about the query optimizations you guys have developed.

I managed to succeed in executing the benchmark. The problem was that I did not make adjustments in the benchmark.properties file. So it was using parameters and update streams from the test-data directory. So all good now! What I will try next is a larger scale factor (perhaps a custom one).

With the current SF30, importing the data into the docker container as part of the load-in-one-step.sh script takes about 50 mins to one hour. This means that if I try to run a few benchmarks after one another (for example to average out the performance or change some Neo4j config variable) it would take a very long time.

I have updated the frequencies of each query according to Table 4.1 in the specification. By the way in the comments of the benchmark.properties file it says to look at section B.1 which is wrong right?

Anyways, is this long import time just something you have to accept do you think? Is there a quicker way to just dump the data in the container? Like if we have a snapshot of it that we can just reload every single time or something. Not sure how that would work but it would be pretty neat.

szarnyasg commented 3 years ago

@deslay1 Glad to see that resolved!

I'm a bit short on time today so I'll be brief with my answer:

There's a way to snapshot the data. Load it with scripts/load-in-one-step.sh then stop the database with scripts/stop-neo4j.sh. Then, make a backup of the directory ${NEO4J_DATA_DIR} (by default, it's scratch/data in the cypher directory). Then, before each run, replace the scratch/data directory with the backup and start Neo4j with scripts/start-neo4j.sh.

PS: I'll take a look at the comment about section B.1. PSPS: you definitely need to reset the database between runs as the update queries insert new nodes/edges to the database -- if you run them twice, their second run will fail due to uniqueness constraints.

deslay1 commented 3 years ago

Thank you, that sounds like a good idea, I will work with that and give an update!

deslay1 commented 3 years ago

I tried rsync -a (but with removing the data directory beforehand) using a backup directory like you suggested and this showed a large decrease in time spent. It took aorund 20 mins instead of 1 hour (so about 3 times faster). I have been wondering if there is a possibility of somehow not removing the data directory entirely and only adjusting for the differences between the data and the backup directories?

Technically I thought rsync -a already does this so I just tried not removing the data directory but now I get the error Unable to get a routing table for database 'neo4j' because this database is unavailable, which I think is from the script create-indices.sh (so it managed to start the container and connect to the cypher shell).

szarnyasg commented 3 years ago

I have been wondering if there is a possibility of somehow not removing the data directory entirely and only adjusting for the differences between the data and the backup directories? I'm not aware of a simple way to do so. You may try to use a copy-on-write system (like btrfs) or you can run the benchmark within Docker (and bake the data/ directory to the container image -- I did this when grading homeworks using large data sets (that were mutated by the students' solutions) and it worked fine.

Technically I thought rsync -a already does this so I just tried not removing the data directory but now I get the error Unable to get a routing table for database 'neo4j' because this database is unavailable, which I think is from the script create-indices.sh (so it managed to start the container and connect to the cypher shell). DB unavailable may be the result of deleting the 'too many' directories in the data/ directory (data/databases/, data/transactions/). I don't remember exactly which ones you should delete or keep -- I think replacing data/databases/ with a backup should be fine but I may be wrong. You may want to consult the documentation on various Neo4j directories: https://neo4j.com/docs/operations-manual/current/configuration/file-locations/

On Fri, Oct 8, 2021, 11:25 Osama Eldawebi @.***> wrote:

I tried rsync -a (but with removing the data directory beforehand) using a backup directory like you suggested and this showed a large decrease in time spent. It took aorund 20 mins instead of 1 hour (so about 3 times faster). I have been wondering if there is a possibility of somehow not removing the data directory entirely and only adjusting for the differences between the data and the backup directories?

Technically I thought rsync -a already does this so I just tried not removing the data directory but now I get the error Unable to get a routing table for database 'neo4j' because this database is unavailable, which I think is from the script create-indices.sh (so it managed to start the container and connect to the cypher shell).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ldbc/ldbc_snb_datagen_hadoop/issues/5#issuecomment-938490306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKWPMIBFRNLGK2Q2WTRUKLUF22HNANCNFSM5E6EXMPA .

deslay1 commented 3 years ago

Thanks, I just went with using cp to create the backup directory and then use rsync before I execute a benchmark. I actually copied the entire data directory. From the link you referred to, I found that Neo4j has a few commands to copy and restore data using neo4-admin so that may be something I should at as well.

The results I get from the benchmark executions I have made so far make me think I did something odd along the way. I have posted an issue about this in the benchmark repository: https://github.com/ldbc/ldbc_snb_interactive/issues/195#issue-1026025141