shtyler / datagenerator

Generator of the output stage and sample data from the GoodData model
0 stars 2 forks source link

MOCKUP DATA GENERATOR

Generator of the output stage and sample data from the GoodData model.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project

Problem

Solution

Getting Started

  1. Clone this git repo
  2. Update datagen/param_definitions/LDM.json with your freshly created LDM from the WEB modeler (/admin/modeler/#/projects/{pid}). WARNING: LDM.json must be in ASCII encoding.
  3. Load the datagen folder to your S3 bucket
  4. Set up your output stage (OS) prefix (default is out_generated_vw_)
  5. (Optional) Change default OS prefix in all 3 liquid templates {%-assign out_prefix = "out_generated_vw_" -%}
  6. (Optional) Load custom input to the table generated_custom_values (dataset VARCHAR, field VARCHAR, values VARCHAR). See Example 3.
  7. Deploy & Run SQL executor to trigger 00_create_functions.sql and 01_run_datagen.sql
  8. Schedule & Run ADD to upload data to WS

Usage

Example 1 - Creating the output stage.

Sometimes all you need is to create an output stage according to the model. OS consists of out_ tables and outvw views on top of the out_ tables. If that is the case all you need to run is 1_generate_os.liquid. Update the 01_run_datagen.sql to run only this template - leave only first row - and you are good to go. By running the script you will generate OS.

Example 2 - Creating the output stage and populating it with mockup data.

Mockup data are generated under simple principles with respect to referential integrity. By default, number of records in dimension table, fact table and number of random values in attributes are set to certain values. These can be changed in the template. Date values are always generated within previous 300 days. To run mockup data generation you need to run generate OS and 2_populate_os.liquid template. Adjust 01_run_datagen.sql accordingly.

Example 3 - Creating the output stage, populating it with mockup data and updating it with custom values.

If you need to use your custom values in the attributes or facts you can load them to generated_custom_values table with following structure (dataset, field, values) by using copy command. Table will be automatically created during the first run of SQL Executor or you can create it before. Then you can either update the table manually or copy command the values into it from CSV.
dataset and field have to match values in the LDM (last part of the identifier). In values you can use either list of values separated by comma (,), e.g. "Plzeň, Matuška, Konrád". or range of numeric values separated by dash (-), e.g. "1 - 50". Real values for the particular field will be generated randomly from the given list or range. The solution does not yet support custom values for dates, connection points and references. To run update with custom values you need to run generate OS, populate OS and 3_update_os.liquid. Adjust 01_run_datagen.sql accordingly.

Run on Local

If you don't want to set up the SQL Executor to generate the code you can render the templates on the local machine using Ruby. Update the datagen/param_definitions/LDM.json with your LDM, update the script run_local.rb to render required template and then simply run the script jruby run_local.rb. It will generate the SQL which you can then use in the ADS.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Twitter @pbobkov Email pavel.bobkov@gooddata.com

Project Link: https://github.com/shtyler/datagenerator

Acknowledgements