sitespeedio / sitespeed.io

sitespeed.io is an open-source tool for comprehensive web performance analysis, enabling you to test, monitor, and optimize your website’s speed using real browsers in various environments.
https://www.sitespeed.io/
MIT License
4.75k stars 603 forks source link

Is data produced by sitespeed 4 entirely different from sitespeed 3 #1311

Closed abhagupta closed 8 years ago

abhagupta commented 8 years ago

Hi, I just wanted to confirm before I start the task of transforming sitespeed 4 output of JSON data into our database, whether sitespeed 4 output is entirely chnaged from sitespeed 3. In sitespeed 3, we used to have a data structure like this

{
    "browsertime": {
        "browsertime": [{}],
        "har": [{}]
    }

}

In sitespeed, i am seeing something like

{
    "browser": {},
    "pageinfo": {},
    "timings": {},
    "coach":{},
     "har":{},
       ...
}

I do see har but no browsertime metrics.

I checked the data produced by browsertime as well, using https://www.sitespeed.io/documentation/browsertime/#a-simple-example, that data structure is also changed a bit, although looks similar. somethign along:

{
    "info": {
        "browsertime": {
            "version": "1.0.0-beta.9"
        },
        "url": "https://www.sitespeed.io",
        "timestamp": "2016-11-03T10:36:17-07:00"
    },
    "browserScripts": [{
        "browser": {

        },
        "pageinfo": {

        }
....
}

My question is whether there is a recommended way to get the browsertime data from sitespeed so that I have minimal work in transformation. Also, does it make more sense to use coach metrics now instead of browsertime?

abhagupta commented 8 years ago

Also, is there a way I can get the test info and page that was tested on from sitespeed results. I see that browsertime produces this data in their info section

"info": {
    "browsertime": {
      "version": "1.0.0-beta.9"
    },
    "url": "https://www.sitespeed.io",
    "timestamp": "2016-11-03T10:36:17-07:00"
  },

but this data is not collected by sitespeed. Anyway I can get hold of url and timestamp?

soulgalore commented 8 years ago

Hey @abhagupta been home with a sick kid today and need to do some work but I'll get back tonight before I go to bed, sorry for the delay.

abhagupta commented 8 years ago

No problem.. take it easy!! I might be able to look through sitespeed's code.

soulgalore commented 8 years ago

First I've updated the plugin documentation yesterday, it still misses things but it explains parts much better than before: https://www.sitespeed.io/documentation/sitespeed.io/plugins/#create-your-own-plugin ping @tobli @beenanner would love to have your input there when you have time!

Yes the structure has changed. I know that is not optimal and means more work, the reason is that the structure in 3.x wasn't sustainable, the new one seems (at least now) that it will hold for years.

"browser": {},
"pageinfo": {},
"timings": {},
"coach":{}

In 4.0 all these metrics are collected using browsertime. We collect the default Javascripts in Browsertime https://github.com/sitespeedio/browsertime/tree/master/browserscripts and then add the Coach javascripts on top (and you can add your own). So in this case, timings are the important ones from Browsertime (like in 3.x).

My question is whether there is a recommended way to get the browsertime data from sitespeed so that I have minimal work in transformation

Does your plugin acts on different messages? Then you can collect it from browsertime. I'm not fully 100% sure how you wanna do this, so lets try to sync.

Also, does it make more sense to use coach metrics now instead of browsertime?

The coach metrics is the more YSlow. We still run some timing metrics inside of the coach but you should use the ones in Browsertime (or rather "timings") because they will be the original ones and will hold stats like median (if you wanna use that) and also have SpeedIndex (still experimental).

Also, is there a way I can get the test info and page that was tested on from sitespeed results. I see that browsertime produces this data in their info section

Do you collect the info from a message? Then you have the URL in the message (message.url). If you use the dataCollection you should look at the HTML plugin to see how you can use that data. I can help you more there later.

abhagupta commented 8 years ago

Thanks @soulgalore for detailed answer. Makes sense.. Last question. The timings has one set of data even if I run multiple tests (of one url) in one execution of sitespeed. So are the values inside timings are median of all runs? Or do I get a choice to get 90%tile as well ?

soulgalore commented 8 years ago

@abhagupta you can get whatever you want. if you collect data directly from browsertime messages, browsertime sends browsertime.run message that contains all the data for each run and then browsertime.pageSummary that is median/p90 values etc for the runs for that URL. This is from my head, so I need to verify :)

tobli commented 8 years ago

Data from Browsertime has median, mean, and percentiles. One way of checking is just to run Browsertime standalone on a given url (use the version that corresponds to the Sitespeed.io version you use). It's hard to make a general recommendation for how to process data from Sitespeed to put in a database, it all depends on your use case. For some cases it might be most convenient to write a simple plugin to extract just what you need and process that (e.g. writing to a database without intermediate json files). To see contents of messages you can pick up in plugins, run sitespeed.io with the --debug flag and -v or -vv.
On the other hand, if you just want Browsertime data, you can run Browsertime standalone without sitespeed.

I'm closing this issue now, please open a new issue if you find things that don't seem to work, or hit any limitations. Thanks!

abhagupta commented 7 years ago

Thanks @tobli you brought up a very good point that I can just send the data directly to database without generating intermediate json files. The only issue with Kairos DB (which we are using) is that it does a HTTP POST call for adding a metric (unlike Graphite). And adding so many metrics produced by sitespeed.io is going to crash the communication between plugin and database. So here are few things I am trying, and I might need your suggestion in this :

./node_modules/.bin/sitespeed.io http://www.example.com -b chrome -n 2 --metrics.filter *- coach.pageSummary.advice.performance.adviceList.*.score  *- coach.pageSummary.advice.timings.*  *- aggregateassets.*.* *-coach.* *- domains.*  --plugins.load plugin.js 

but I am still seeing a lot of aggregateassets metrics. I also do not want coach right now. But *- coach.* did not remove the coach metrics.. Any suggestions on what I am doing wrong?

In fact, i just need browsertime and pagexray at this time.. If you have option handy for just these 2, could you let me know.

If I can reduce the data to some 100 metrics, doing POST call wouldn't be bad. Please let me know if you have suggestions/alternatives.

soulgalore commented 7 years ago

Hey @abhagupta the documentation is hard to understand, we need to work on that, sorry! The *- removes all configuration, so you need to only do that ones (first) and then add the rest as you want. I think we can also make the filters easier to understand.

To get only timings from Browsertime and content types (size and request) from pagexray you can run like this:

bin/sitespeed.js --metrics.filter *- browsertime.pageSummary.statistics.timings.* pagexray.pageSummary.contentTypes.* -n 1 https://www.sitespeed.io 

Best Peter

tobli commented 7 years ago

A custom plugin can look at as many or as few metric types as it wants (--metrics.filter only applies to the built-in graphite and influxdb plugins).

I simple plugin for posting browsertime and pagexray data can look like this:

'use strict';

const Promise = require('bluebird');
const http = require('http');

function postJson(json) {
  const options = {
    hostname: 'httpbin.org',
    path: '/post',
    method: 'POST'
  };

  return new Promise((resolve, reject) => {
    const req = http.request(options, (res) => {
      res.once('end', () => resolve());
    });
    req.once('error', (e) => reject(e));

    req.write(json);
    req.end();
  });
}

module.exports = {
  name() {
    return 'poster';
  },
  processMessage(message) {
    switch (message.type) {
      case 'browsertime.pageSummary': {
        const data = {
          url: message.url,
          type: message.type,
          timestamp: message.timestamp,
          data: message.data.statistics
        };
        return postJson(JSON.stringify(data));
      }

      case 'pagexray.pageSummary':
        return postJson(JSON.stringify(message));

      default:
      // Ignore everything else
    }
  }
};

Run like this (note that --plugins.load need to be last, for now at least):

sitespeed.io -n1  http://www.sitespeed.io --plugins.load ./poster.js

The code to pick out metrics from the messages can be as simple or complex as you need it to be.

abhagupta commented 7 years ago

Thanks guys!! this will help a lot.