The wiki of the previous (mix of Hadoop/Spark) repository had partial documentation of the user-facing parameters:
## User-facing Parameters
Users of the LDBC data generator specify configuration by means of the `params.ini` file.
The `params.ini` file contains the following options:
* `generator.mode`
+ default: `interactive`
+ options: `interactive` `bi` `graphalytics` `rawData`
+ description: the mode the datagen executes in.
* `generator.scaleFactor`
+ default: `1`
+ options: `0.003` `0.1` `0.3` `1` `3` `10` `30` `100`
+ description: determines the generated data size. Note `0.003` `0.1` `0.3` are used for testing. Note, graphalytics scale factor is set with a different parameter see below.
* `serializer.format`
+ default: `CsvBasic`
+ options: `CsvBasic` `CsvMergeForeign` `CsvComposite` `CsvCompositeMergeForeign`
+ description: determines the data serialization format
* `generator.numThreads`
+ default: `1`
+ description: determines the number of threads Hadoop using TODO: different now using Spark?
### Interactive
`interactive` mode only parameters:
* `generator.mode.interactive.numUpdateStreams`
+ default: `1`
+ description: determines the number update streams (consisting of inserts and deletes) for the Interactive workload.
### BI
`bi` mode only parameters:
* `generator.mode.bi.batches`
+ default: `month`
+ options: `day` `month` `quarter`
+ description: determines batch time granularity
* `generator.mode.bi.deleteType` TODO: this feature has been decided against
+ default: `simple`
+ options: `simple` `complex`
+ description: determines delete operation type included in the batches.
### Graphalytics
`graphalytics` mode only parameters
* `generator.scaleFactor`
+ default: `graphalytics.1`
+ options: `graphalytics.1` `graphalytics.3` `graphalytics.10` `graphalytics.30` `graphalytics.100` `graphalytics.300` `graphalytics.1000` `graphalytics.3000` `graphalytics.10000` `graphalytics.30000`
+ description: determines the generated data size. Note `0.003` `0.1` `0.3` are used for testing. Note,
* `generator.degreeDistribution`
+ default:`Facebook`
+ options: `Facebook` `Altmann` `Weibull` `Empirical` `Geo` `MoeZipf` `Zipf`
+ description:
## Internal Parameters
Internal parameters are divided into two categories `generator` and `hadoop` related.
### Generator
These are determined by `generator.scaleFactor`
* `generator.numPersons`
+ default: `10000`
+ description: the number of persons to generate
* `generator.startYear`
+ default: `2010`
+ description: the start year of the simulation
* `generator.numYears`
+ default: `3`
+ description: the number of years to simulate
* `generator.delta`
+ default: `10000`
+ description: the minimum time between two operations
* `generator.dateFormatter`
+ default: `StringDate`
+ options: `StringDate` `LongDate`
* `generator.StringDate.dateTimeFormat`
+ default: `yyyy-MM-dd'T'HH:mm:ss.SSS+00:00`
* `generator.StringDate.dateFormat`
+ default: `yyyy-MM-dd`
* `generator.knowsGenerator`
+ default: `Distance`
+ options: `Distance` `Bter` `Clustering` `Random`
+ description:
* `generator.person.similarity`
+ default: `GeoDistance`
+ options: `GeoDistance` `Interests`
+ description:
### Hadoop
* `hadoop.serializer.compressed`
+ default: `false`
+ description:
* `hadoop.serializer.endlineSeparator`
+ default: `false`
+ description:
* `hadoop.serializer.socialNetworkDir`
+ default: `./social_network`
+ description: TODO this might be a duplicate with `outputDir`
* `hadoop.serializer.hadoopDir`
+ default: `./hadoop`
+ description:
* `hadoop.serializer.outputDir`
+ default: `./social_network/`
+ description:
The wiki of the previous (mix of Hadoop/Spark) repository had partial documentation of the user-facing parameters: