Initial commit for the spark job used to generate the aggregated hw j…

Dexterp37 commented 8 years ago

…son.

@almossawi @megavaughn please do not merge this PR just yet. Let's use this to start a conversation/review.

SamPenrose commented 8 years ago

Here is my review of the notebook (r+ with one small fix):

Bug: In run_survey() if you set only one of start and end date, it will be ignored.

Request: Per https://bugzilla.mozilla.org/show_bug.cgi?id=1262609#c32, I do believe we should count and report failures in get_newest_per_client() and get_valid_client_record(). In what detail we break them out is of less importance.

Comments:

This work is oriented by submission date. Should that choice (vs activity date) be highlighted?
These functions have logic that seems like a natural fit for a unit test: get_newest_per_client() aggregate_pings() collapse_to_other_bucket()

Nits: fetch_previous_state() and store_new_state() have the same docstring. I think it means "Load previously computed results from S3, if they exist." run_survey(): If both are start and end are required (see Bug), why not pass as a single parameter? get_valid_client_record(): whitespace in second if block vendor_name_from_id(): if this code will live long, load from network?

Dexterp37 commented 8 years ago

Thanks Sam, sorry for the delayed reply. I've addressed all your comments locally, and added some tests within the file. I'll update the PR soon.

Dexterp37 commented 8 years ago

@SamPenrose , @rjweiss I've updated the notebook. Given the r+, should I go on and merge it?

Changes:

Added more comments.
Removed functions that were not used.
Addressed the bug Sam found and his request (records discarded due to broken data are no reported in the "discarded" bucket).
Fixed the nits

I left unit testing out for now. I'm considering moving some functions out to a library for easier testing.

SamPenrose commented 8 years ago

Works for me!

mozilla / firefox-hardware-report

Initial commit for the spark job used to generate the aggregated hw j… #1