matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.88k stars 348 forks source link


Last version Build Status Coverage Status Dependency status Dev Dependencies Status NPM Status Node version OpenCollective OpenCollective Gitter

var Xray = require('x-ray')
var x = Xray()

x('', '.post', [
    title: 'h1 a',
    link: '.article-title@href'
  .paginate('.nav-previous a@href')


npm install x-ray


Selector API

xray(url, selector)(fn)

Scrape the url for the following selector, returning an object in the callback fn. The selector takes an enhanced jQuery-like string that is also able to select on attributes. The syntax for selecting on attributes is selector@attribute. If you do not supply an attribute, the default is selecting the innerText.

Here are a few examples:

xray('', 'title')(function(err, title) {
  console.log(title) // Google
xray('', '.content')(fn)
xray('', 'img.logo@src')(fn)
xray('', 'body@html')(fn)

xray(url, scope, selector)

You can also supply a scope to each selector. In jQuery, this would look something like this: $(scope).find(selector).

xray(html, scope, selector)

Instead of a url, you can also supply raw HTML and all the same semantics apply.

var html = '<body><h2>Pear</h2></body>'
x(html, 'body', 'h2')(function(err, header) {
  header // => Pear



Specify a driver to make requests through. Available drivers include:

Returns Readable Stream of the data. This makes it easy to build APIs around x-ray. Here's an example with Express:

var app = require('express')()
var x = require('x-ray')()

app.get('/', function(req, res) {
  var stream = x('', 'title').stream()


Stream the results to a path.

If no path is provided, then the behavior is the same as .stream().


Constructs a Promise object and invoke its then function with a callback cb. Be sure to invoke then() at the last step of xray method chaining, since the other methods are not promisified.

x('', '', [
    title: '.dribbble-img strong',
    image: '.dribbble-img [data-src]@data-src'
  .then(function(res) {
    console.log(res[0]) // prints first result
  .catch(function(err) {
    console.log(err) // handle error in promise


Select a url from a selector and visit that page.


Limit the amount of pagination to n requests.


Abort pagination if validator function returns true. The validator function receives two arguments:

xray.delay(from, [to])

Delay the next request between from and to milliseconds. If only from is specified, delay exactly from milliseconds.

var x = Xray().delay('1s', '10s')


Set the request concurrency to n. Defaults to Infinity.

var x = Xray().concurrency(2)

xray.throttle(n, ms)

Throttle the requests to n requests per ms milliseconds.

var x = Xray().throttle(2, '1s')

xray.timeout (ms)

Specify a timeout of ms milliseconds for each request.

var x = Xray().timeout(30)


X-ray also has support for selecting collections of tags. While x('ul', 'li') will only select the first list item in an unordered list, x('ul', ['li']) will select all of them.

Additionally, X-ray supports "collections of collections" allowing you to smartly select all list items in all lists with a command like this: x(['ul'], ['li']).


X-ray becomes more powerful when you start composing instances together. Here are a few possibilities:

Crawling to another site

var Xray = require('x-ray')
var x = Xray()

x('', {
  main: 'title',
  image: x('#gbar a@href', 'title') // follow link to google images
})(function(err, obj) {
    main: 'Google',
    image: 'Google Images'

Scoping a selection

var Xray = require('x-ray')
var x = Xray()

x('', {
  title: 'title',
  items: x('.item', [
      title: '.item-content h2',
      description: '.item-content section'
})(function(err, obj) {
    title: '',
    items: [
        title: 'The 100 Best Children\'s Books of All Time',
        description: 'Relive your childhood with TIME\'s list...'


Filters can specified when creating a new Xray instance. To apply filters to a value, append them to the selector using |.

var Xray = require('x-ray')
var x = Xray({
  filters: {
    trim: function(value) {
      return typeof value === 'string' ? value.trim() : value
    reverse: function(value) {
      return typeof value === 'string'
        ? value
        : value
    slice: function(value, start, end) {
      return typeof value === 'string' ? value.slice(start, end) : value

x('', {
  title: 'title | trim | reverse | slice:2,3'
})(function(err, obj) {
    title: 'oi'


In the Wild



Support us with a monthly donation and help us continue our activities. [Become a backer]


Become a sponsor and get your logo on our website and on our README on Github with a link to your site. [Become a sponsor]
