kite-sdk / kite-examples

Kite SDK Examples
Apache License 2.0
99 stars 70 forks source link

CDK-928: Utility to generate events to existing table. #23

Open DennisDawson opened 9 years ago

DennisDawson commented 9 years ago

With an eye toward modularization, I've repurposed CreateEvents.java from the Spark example and placed it in org/kitesdk/examples/data. This lets the customer create the events dataset using the CLI, then populate it with a substantial number of records using the Java utility. The same dataset can be used for the Flume and Spark examples, without having to delete them after running their respective jobs.

In GenerateEvents, I essentially swapped the CreateEvents create() method with load(). I added the Avro plug-in to pom.xml, copied the avro folder with standard_event.avscinto the main directory, and copied BaseEventsTool.java to org/kitesdk/examples/data.

In my environment, it compiles, runs, and populates the events table as expected.

**Update

The random records were a little too random: if the user_id, session_id, and ip are different each time, when the Crunch utility runs, there are no sessions to aggregate. I revised the run method to first generate the user_id, session_id, and ip, then used a for loop to generate 1-25 random events. I also modified the randomTimestamp method to increase the base length of time and add random padding to create more realistic session duration.

I'm happy to incorporate any changes that make the code more elegant, my changes just make it work.