tounano / hyperquext

Like `Hyperquest` with extensions.
MIT License
4 stars 0 forks source link

hyperquext

Like Hyperquest with extensions. Make streaming HTTP requests.

rant

You can read my rant in decorquest.

To sum stuff up, here are the main motives behind both hyperquext and decorquest:

  1. I needed something that can handle scale.
  2. I needed something that I can extend without changing it's code. (Open-closed principle)

And here comes hyperquext - hyperquest with extensions.

What about decorquest?

decorquest is another module I authored for HTTP requests. Hyperquext is another layer on top of decorquest. hyperquext depends on decorquest.

Usage

var hyperquext = require("hyperquext");

var req = hyperquext(uri, opts, cb)

Make an outgoing request.

Args:

Return value: This method returns a RequestProxy object. It's methods would be described bellow.

Overloading:

Options:

Basically, it depends on the extensions you use. Each extension might require additional options. However, here are the basic options of hyperquext:

Additional options:

additional methods

hyperquext has some more methods:

RequestProxy

An instance of this class is returned by hyperquext on every request. This object is a duplex stream. You can stream the response to a downstream.

Or stream additional data (like POST data) towards that stream.

The actual request would be performed on the next tick, so you can set some data outside of opts using it's methods.

RequestProxy is a wrapper for ClientRequest objects, which are the default return values of Node's http.request().

methods

events

RequestProxy is a stream. It will have all the events that a stream would have. Such as data, end and close.

Additional events:

In addition to that, each decorator might introduce new events. For example, hyperquextDirect will emit redirect event on every redirect.

Response object

The response object that will be passed on a response event is the standard response object that is provided by Node, with one addition.

It would have a new object called request binded to it at res.request. res.request would be a simple DTO which will have all the opts that the request was performed with and some other additions added by decorators.

For example, the follow redirects decorator would bind a redirects object to it, that will summerize all the redirects that were followed.

The respones object would be added to the RequestProxy object once the response is ready, and it would be accessible through req.res.

Examples

Perform many requests

This example was borrowed from hyperquest.

/*
  This example was borrowed from `hyperquest` module.
 */

var http = require('http');
var hyperquext = require('hyperquext');

var server = http.createServer(function (req, res) {
  res.write(req.url.slice(1) + '\n');
  setTimeout(res.end.bind(res), 3000);
});

server.listen(5000, function () {
  var pending = 20;
  for (var i = 0; i < 20; i++) {
    var r = hyperquext('http://localhost:5000/' + i);
    r.pipe(process.stdout, { end: false });
    r.on('end', function () {
      if (--pending === 0) server.close();
    });
  }
});

process.stdout.setMaxListeners(0); // turn off annoying warnings

Extending hyperquext using decorquest decorators

In this example, we'll extend hyperquext with decorquest decorators. We'll add a simple http proxy support.

var hyperquext = require("hyperquext");
var dq = require("decorquest");

// Create a basequest object using `decorquest`
// Mind the `proxyquest` decoration.
var basequest = dq.proxyquest(dq.attachAuthorizationHeader(dq.disableGlobalAgent(dq.request)));

// We're injecting basequest to hyperquest.
// And setting a proxy as we would do in `decorquest`. In that case I use my Fiddler proxy.
var r = hyperquext("http://www.google.com", {maxRedirects: 5, proxy: "http://127.0.0.1:8888", basequest: basequest});

r.pipe(process.stdout);

r.on("response", function (res) {
  // Delay it for 3 seconds, so that r.pipe(process.stdout); will complete. Not necessary.
  // So that stuff won't mess up in console.
  setTimeout( function () {
    console.log(res.request);
  }, 3000);
});

Decorators

hyperquext supports two types of decorators. Low-level decorators, that are provided by decorquest and High-level decorators that decorate hyperquext directly and return a RequestProxy object.

Usage

var hyperquext = require("hyperquext");
var hyperquextDecorator = require("some-hyperquext-decorator");
var request = hyperquextDecorator(hyperquext);

var req = request("http://www.google.com");

hyperquextDirect

As of today, the only decorator that comes out of the box is hyperquextDirect. This decorator will follow 3XX redirects on GET requests.

This decorator will work, only if you pass a maxRedirects option. If this option is not present, it will send the request as is.

Events

hyperquextDirect will emit the redirect event on every redirect. The argument would be a response object of the redirect.

Response

hyperquextDirect will add an array of redirects to res.request.redirects.

Usage

var hyperquext = require("hyperquext");
var hyperquextDirect = hyperquext.decorators.hyperquextDirect;

// Let's decorate hyperquext
var request = hyperquextDirect(hyperquext);

// http://google.com should redirect to http://www.google.com
var r = request("http://google.com", {maxRedirects: 5});

Examples

Simple redirects example

var hyperquext = require("hyperquext");
var hyperquextDirect = hyperquext.decorators.hyperquextDirect;

// Let's decorate hyperquext
var request = hyperquextDirect(hyperquext);

// http://google.com should redirect to http://www.google.com
var r = request("http://google.com", {maxRedirects: 5});

r.pipe(process.stdout);

// Redirect events
r.on("redirect", function (res) {
  // Delay it for 2 seconds, so that r.pipe(process.stdout); will complete. Not necessary.
  // So that stuff won't mess up in console
  setTimeout( function () {
    console.log("\n\nredirected\n\n");
  }, 2000)
})

r.on("response", function (res) {
  // Delay it for 3 seconds, so that r.pipe(process.stdout); will complete. Not necessary.
  // So that stuff won't mess up in console.
  setTimeout( function () {
    console.log(res.request);
  }, 3000);
});

Using both High-level and Low-level decorators

Let's take the example from before where we use a proxyquest decorator from decorquest module and combine it with hyperquextDirect.

var hyperquext = require("hyperquext");
var hyperquextDirect = hyperquext.decorators.hyperquextDirect;
var dq = require("decorquest");

// Create a basequest object using `decorquest`
// Mind the `proxyquest` decoration.
var basequest = dq.proxyquest(dq.attachAuthorizationHeader(dq.disableGlobalAgent(dq.request)));

// Decorate hyperquext
var request = hyperquextDirect(hyperquext);

// http://google.com should redirect to http://www.google.com
// We're injecting basequest to hyperquest.
// And setting a proxy as we would do in `decorquest`. In that case I use my Fiddler proxy.
var r = request("http://google.com", {maxRedirects: 5, proxy: "http://127.0.0.1:8888", basequest: basequest});

r.pipe(process.stdout);

// Redirect events
r.on("redirect", function (res) {
  // Delay it for 2 seconds, so that r.pipe(process.stdout); will complete. Not necessary.
  // So that stuff won't mess up in console
  setTimeout( function () {
    console.log("\n\nredirected\n\n");
  }, 2000)
})

r.on("response", function (res) {
  // Delay it for 3 seconds, so that r.pipe(process.stdout); will complete. Not necessary.
  // So that stuff won't mess up in console.
  setTimeout( function () {
    console.log(res.request);
  }, 3000);
});

Developing decorators

In order to develop your own extensions for hyperquext first you need to decide if you're going to develop Low-level or High-level decorators.

As a rule of thumb, Low-level (decorquest) decorators should be preffered. Please look at the documentation of decorquest to see how it should be done.

The only case when you should prefer developing a High-level decorator is when the success of the request depends on the response, such as the case of handling 3XX redirects or introducing a feature where you retry the request if it fails.

Helpers API

hyperquext.createRequestProxy()

This method would create a RequestProxy object that you can return immediately to the user.

Events you must emit

hyperquext.helpers.getFinalRequestFromHyperquext(req, cb)

finalRequest can be retrieved by listening to a finalRequest event or by accessing req.finalRequest in case finalRequest was already emitted.

cb is a callback accepts 2 args. (err, finalRequest). err will be always null, this structure is made just to follow Node's standards.

hyperquext.helpers.getResponseFromClientRequest(clientRequest, cb)

response can be retrieved by listening to a response event or by accessing clientRequest.res in case response was already emitted.

cb is a callback accepts 2 args. (err, res). err will be always null, this structure is made just to follow Node's standards.

hyperquext.helpers.bindMethod(method, hyperquext)

A helper method that helps creating get, put, post, delete methods to the decorators.

Example:

var hyperquext = require("hyperquext");

function passthroughDecorator(hyperquext) {
  function decorator (uri, opts, cb) {
    var proxy = hyperquext.createProxy();
    // Just call hyperquext
    var hq = hyperquext(uri, opts, cb);

    hq.on('request', function (clientRequest) {proxy.emit('request', clientRequest);});
    hq.on('finalRequest', function (clientRequest) {proxy.emit('finalRequest', clientRequest);});

    return proxy;
  }

  decorator["get"] = bindMethod("GET", decorator);
  decorator["put"] = bindMethod("PUT", decorator);
  decorator["post"] = bindMethod("POST", decorator);
  decorator["delete"] = bindMethod("DELETE", decorator);

  return decorator;
}

Devcorators API

In version 0.2.0 devcorators were introduced. The idea behind devcorators is simply DRY. Devcorators are helpers that take care of common tasks such as parsing the arguments.

hyperquext.devcorators.parseArgs(hyperquext)

If you're going to develop a decorator, you don't need to parse args anymore. Simply use this devcorator.

Example:

// A passthrough decorator
function passthroughDecorator(hyperquext) {
  // The state of the decorator comes here.
  // Note that I'm wrapping my decorator using a devcorator.
  return parseArgs(function (uri, opts, cb){
    return hyperquext(uri, opts, cb);
  });
}

hyperquext.devcorators.attachBodyToResponse(hyperquext)

This decorator streams the response into a string that would be located at res.body. res and the RequestProxy would remain streamable as usual.

This operation is "expensive", however sometimes it's mandatory. Use it only when you really need it.

The option it listens to is {body: true}

Example:

attachBodyToResponse(hyperquext)('http://www.google.com',{body: true},function (err, res) {
  console.log(res.body);
});

hyperquext.devcorators.consumeForcedOption(hyperquext, option)

Let's you're developing a decorator that must use the attachBodyToResponse. In order to do it, you have to specify `{body: true} in options. The user however, didn't specified it in options.

In that case, we can manually change the option in the decorator. The problem is that we don't know how the consumer of our decorator uses the response object. So it's not a good idea to load stuff that the user didn't ask for. It's a safe way to memory-leak hell.

What this devcorator does, is to take care of this stuff. It'll add an option and will delete it after consumption. If the option was introduced before by the user or other decorator, it will act as a passthrough.

Example:

function someDecorator(hyperquext){
  return parseArgs(uri, opts, cb) {
    // Some logic here...

    var req = consumeForcedOption(attachBodyToResponse(hyperquext), 'body')(uri, opts, cb);

    getFinalRequestFromHyperquext(req, function (err, finalRequest) {
      getResponseFromClientRequest(finalRequest, function (err, res) {
        // Some logic related to res.body here
      })
    })

    return req;
  }
}

hyperquext.devcorators.redirector(hyperquext)

There are several use-cases of redirection. It can be on the most common scenarios like 3XX status codes, or it might be on a less common scenarios such as Meta Refresh Redirect.

redirector provides a framework for following redirects. It triggered by opts.maxRedirects and `response['$redirect'].

In order to instruct redirector to redirect to some other url, you have to attach $redirect property to the response object. The $redirect property must consist of:

Example: hyperquextDirect

function hyperquextDirect(hyperquext) {
  return redirector(parseArgs(function (uri, opts, cb) {
    var req = hyperquext(uri, opts, cb);
    if (req.reqopts.method !== 'GET' || !(opts.maxRedirects)) return req;

    getFinalRequestFromHyperquext(req, function (err, finalRequest) {
      getResponseFromClientRequest(finalRequest, function (err, res) {
        if (parseInt(res.statusCode) >= 300 && parseInt(res.statusCode) < 400) {
          finalRequest.res['$redirect'] = {
            statusCode: res.statusCode,
            redirectUri: url.resolve(opts.uri, res.headers.location)
          }
        }
      })
    })

    return req;
  }));
}

Final words

This module is under heavy development, and my hope is that other devs would be able to join this project and together we'll create the best web scraping platform out there.

Special thanks to substack for the big inspiration from his hyperquest module. If you don't need those fancy decorations it would be a better idea to use hyperquest.

Important Note

Please follow hyperquext on github to get notified on API changes.

In any case, make sure to specify a version in package.json, so that if an API change were introduced your app won't collapse.

The following practice is highly recommended:

...
  depndencies: [
    "hyperquext": "0.2.*"
  ]
...

Changelog

0.2.0

0.1.0

install

With npm do:

npm install hyperquext@0.2.*

license

MIT