linkedin / dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
BSD 2-Clause "Simplified" License
131 stars 34 forks source link

Add support for a reducer step that aggregates per-user metrics #76

Closed csgregorian closed 5 years ago

csgregorian commented 5 years ago

The goal of this PR is to let Dynamometer collect and emit per-user metrics, as opposed to the whole-workload measurements that it currently collects using Counters. This is done by first changing the AuditReplayMapper to emit key-value pairs in the form of (username_type, latency) where type is either READ or WRITE. Then, a reducer step AuditReplayReducer is added to sum latencies per key. More generally, this change allows Dynamometer to support arbitrary reducers for stats aggregation along with the existing mappers.

I modified TestWorkloadGenerator to run with and without a reducer, and confirm that the file was created. This still needs a test to ensure that the actual contents of the output file are correct. As well, it's possible that I changed some things that didn't need to be changed in the process of getting the tests to pass/output to appear.

Mostly trying to see if CI will pass, but ready for some preliminary reviews.

EDIT: a couple other things that I'm not sure are the right approach and need to be looked at a bit closer:

csgregorian commented 5 years ago

Simplified a ton of stuff here.

Ready for more reviews!