squarewave / bhr.html

Mozilla Public License 2.0
5 stars 2 forks source link

Add Tracking for various DevTools stacks #25

Open ochameau opened 6 years ago

ochameau commented 6 years ago

I would like to start tracking some specific stacks related to DevTools. But I'm not sure they will be all worth tracking and I may have many to track. I was wondering what was the machinery behind the "tracked hangs" section? Is this something I can setup locally on my machine?

squarewave commented 6 years ago

You have to run it through https://analysis.telemetry.mozilla.org, unfortunately. The repo is here, and you configure it and run it from a python notebook. If you'd like to go through with that, I can write up detailed instructions, but it's not a remarkably user-friendly process. However, the code you're interested in is here. It's relatively simple - just python that, given the data for a particular hang, returns True or False. Note that the "stack" argument is unsymbolicated, so it will only be useful for filtering JS stacks. Symbolicating tends to explode the time taken by the job, so I can get it to work, but I was planning to avoid doing so until someone actually requires it.

ochameau commented 6 years ago

For now I only care about JS stacks. Most of DevTools is written in Javascript. If you think I can run that locally and get results, yes, I'm ready to invest some time in setting this up. The tracked.py logic is trivial, I'll need more help on overall setup of this program.

squarewave commented 6 years ago

Kk - I'll try to give a step-by-step:

  1. Visit atmo, and log in.
  2. Launch a spark cluster
  3. Once it's set up, follow the instructions to SSH to it, etc. a. While it's setting up, go ahead and clone https://github.com/squarewave/background-hang-reporter-job, and follow the instructions in the README to get it up and running
  4. Once you're at http://localhost:8888/tree, select New > python [default]
  5. Once you're in that new notebook, paste the following:
repo_dir = "tmp"
repo_https_url = "https://github.com/ochameau/background-hang-reporter-job"
sc.defaultParallelism

!rm -rf $repo_dir
!git clone $repo_https_url $repo_dir && cd $repo_dir && python setup.py bdist_egg

import os
distpath = repo_dir + '/dist'
sc.addPyFile(os.path.join(distpath, os.listdir(distpath)[0]))

import background_hang_reporter_job
from datetime import datetime, timedelta

background_hang_reporter_job.etl_job_tracked_stats(sc, sqlContext, {
    'start_date': datetime.strptime("20170901", "%Y%m%d"),
    'end_date': datetime.today(),
    'sample_size': 0.01, # change this to 1.0 when you're done. Just using a low sample for dev
    'hang_profile_out_filename': 'historical_data_TEST',
    'exclude_modules': True,
})
  1. If you run that, it should produce roughly the data that you see on https://arewesmoothyet.com. The data will be located at https://analysis-output.telemetry.mozilla.org/bhr/data/hang_aggregates/historical_data_TEST.json
  2. Go ahead and add another entry to tracked.py in your clone. You'll need to commit, push, and close your notebook and restart it in order to pick up the changes, unfortunately
  3. If you want to see it on a local copy of your dashboard, go ahead and clone https://github.com/squarewave/bhr.html a. You'll need to edit this line temporarily to point to historical_data_TEST instead of historical_data. b. You'll need to add a link here to your new tracked stat.
ochameau commented 6 years ago

Thanks for the detailed steps. I'll give that a try tomorrow.

ochameau commented 6 years ago

I finally found some time to test this and it seems to work fine. Thanks a ton for your detailed steps!! I just ran with 0.01 samples and just pushed another run with 1.0. It took about 3 hours to complete, I've no idea how much time it will take for a full scan...

squarewave commented 6 years ago

Yeah, it definitely takes some time. The scheduled job uses a cluster of 16 nodes for this reason. If you're running for a 1.0 sample on a single node it probably won't finish within the lifetime of your cluster. I would say a 0.01 sample should be sufficient for testing though, if you'd like to submit the PR and I can get it into the scheduled job. (If you want to do more testing though, you could play with spinning up a larger cluster, or just set a more recent value for "start_date").