Due to open items like ldbc/ldbc_snb_datagen#206 with dev, I am currently running with the stable branch (commit hash d6620b96555af9a1151b158251ae0de4a3fb6447) on a 8-node Hadoop cluster (Hadoop 3.3.0, Centos 7.5, Java 8). There seems to be some weird issue with the output folders: initially, the data generation seemed to succeed, but unfortunately the dynamic folder did not seem to have been produced. After trying some things, I thought I'd set ldbc.snb.datagen.serializer.outputDir to /ldbc_dataset/sf1, the generation consistently fails at the Person Serializer job, with the below stack. I have checked Hadoop permissions, ownership and groups - nothing helped.
Do you see anything here which I can do?
2020-10-15 19:02:00,714 INFO mapreduce.Job: Task Id : attempt_1602777928494_0089_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:97)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.io.IOException: Mkdirs failed to create /ldbc_dataset/sf1/social_network/dynamic (exists=false, cwd=file:/mydata4/hadoop/yarn/local/usercache/centos/appcache/application_1602777928494_0089/container_1602777928494_0089_01_000004)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:473)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:458)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1164)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1144)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1103)
at ldbc.snb.datagen.hadoop.writer.HdfsWriter.<init>(HdfsWriter.java:66)
at ldbc.snb.datagen.hadoop.writer.HdfsCsvWriter.<init>(HdfsCsvWriter.java:49)
at ldbc.snb.datagen.serializer.snb.csv.CsvSerializer.initialize(CsvSerializer.java:23)
at ldbc.snb.datagen.serializer.LdbcSerializer.initialize(LdbcSerializer.java:20)
at ldbc.snb.datagen.hadoop.serializer.HadoopPersonSortAndSerializer$HadoopDynamicPersonSerializerReducer.setup(HadoopPersonSortAndSerializer.java:91)
... 8 more
Due to open items like ldbc/ldbc_snb_datagen#206 with
dev
, I am currently running with thestable
branch (commit hash d6620b96555af9a1151b158251ae0de4a3fb6447) on a 8-node Hadoop cluster (Hadoop 3.3.0, Centos 7.5, Java 8). There seems to be some weird issue with the output folders: initially, the data generation seemed to succeed, but unfortunately thedynamic
folder did not seem to have been produced. After trying some things, I thought I'd setldbc.snb.datagen.serializer.outputDir
to/ldbc_dataset/sf1
, the generation consistently fails at thePerson Serializer
job, with the below stack. I have checked Hadoop permissions, ownership and groups - nothing helped.Do you see anything here which I can do?