matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.88k stars 350 forks source link

Crawling to another site #130

Closed umpirsky closed 8 years ago

umpirsky commented 8 years ago

Here is my snippet:

var Xray = require('x-ray');
var x = Xray();

x('https://en.wikipedia.org/wiki/Main_Page', {
  title: 'title',
  subtitle: x('li#n-featuredcontent a@href', 'title')
})(function(err, obj) {
    console.log(obj);
})

Expected:

{ title: 'Wikipedia, the free encyclopedia', subtitle: 'Portal:Featured content - Wikipedia, the free encyclopedia' }

Actual:

{ title: 'Wikipedia, the free encyclopedia' }

Example from readme returns:

{ main: 'Google' }

as well, while:

{
    main: 'Google',
    image: 'Google Images'
}

is expected.

ibeerepoot commented 8 years ago

I have exactly the same problem, crawling to another site returns nothing. I'm thinking of saving the urls you get and then crawl all those pages, like so:

// loop through all new urls x('url[i]', 'title');

But I'm still hoping to get a prettier solution from x-ray.

OllieJennings commented 8 years ago

mentioned in #111