ldbc-dev / ldbc_snb_datagen_deprecated2015

LDBC-SNB Data Generator
GNU General Public License v3.0
12 stars 5 forks source link

Limitations on choice of generator's parameters? #5

Closed wileeam closed 10 years ago

wileeam commented 10 years ago

Hi!

we are having trouble generating a dataset with 'small' numbers for the parameters. See below (hadoop configured for single node run and one thread set in the script run.sh and using the latest commit as of this posting, da42eb54a215de86474d142346864057dc6a5624):

numPersons:1000
startYear:2014
numYears:10
serializerType:csv
enableCompression:false

Same problem happens when choosing 10000 persons. And basically we don't get any data in the different files generated. We get this error a few times...

14/05/22 14:08:57 INFO mapred.JobClient: Task Id : attempt_201405211824_0061_r_000000_2, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 91
        at ldbc.socialnet.dbgen.generator.ScalableGenerator.generatePosts(ScalableGenerator.java:1026)
        at ldbc.socialnet.dbgen.generator.ScalableGenerator.generateUserActivity(ScalableGenerator.java:816)
        at ldbc.socialnet.dbgen.generator.MRGenerateUsers$UserActivityReducer.reduce(MRGenerateUsers.java:277)
        at ldbc.socialnet.dbgen.generator.MRGenerateUsers$UserActivityReducer.reduce(MRGenerateUsers.java:247)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

Is this a limitation of the generator or are we doing something wrong because we find no trouble generating a dataset when using the setting of the scale factor 3 for example.

Thanks!

ArnauPrat commented 10 years ago

Hi again Guillermo,

I'll look into this issue asap.

Arnau.

On Thu, May 22, 2014 at 2:13 PM, Guillermo notifications@github.com wrote:

Hi!

we are having trouble generating a dataset with 'small' numbers for the parameters. See below (hadoop configured for single node run and one thread set in the script run.sh and using the latest commit as of this posting, da42eb5):

numPersons:1000 startYear:2014 numYears:10 serializerType:csv enableCompression:false Same problem happens if choosing 10000. And basically we don't get any data in the different files generated. We get this error a few times...

14/05/22 14:08:57 INFO mapred.JobClient: Task Id : attempt_201405211824_0061_r_000000_2, Status : FAILED java.lang.ArrayIndexOutOfBoundsException: 91 at ldbc.socialnet.dbgen.generator.ScalableGenerator.generatePosts(ScalableGenerator.java:1026) at ldbc.socialnet.dbgen.generator.ScalableGenerator.generateUserActivity(ScalableGenerator.java:816) at ldbc.socialnet.dbgen.generator.MRGenerateUsers$UserActivityReducer.reduce(MRGenerateUsers.java:277) at ldbc.socialnet.dbgen.generator.MRGenerateUsers$UserActivityReducer.reduce(MRGenerateUsers.java:247) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Is this a limitation of the generator or are we doing something wrong because we find no trouble generating a dataset when using the setting of the scale factor #3 for example.

Thanks!

— Reply to this email directly or view it on GitHub.

ArnauPrat commented 10 years ago

Hey Guillermo, The issue should be fixed. Thanks Andrey for the fix. Regards, Arnau

wileeam commented 10 years ago

Hello,

an update on this. We just successfully generated a dataset with 1K people for a period of 10 years without errors. I'll close this issue for now :)

Thanks a lot!

/Guillemro