matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.88k stars 350 forks source link

Not finding all selectors #100

Closed ivansabik closed 9 years ago

ivansabik commented 9 years ago

Could you help me understanding why the following code only finds one result? Im trying:

var Xray = require('x-ray');
var x = Xray();

var URL = 'https://tools.usps.com/go/TrackConfirmAction.action?tRef=fullpage&tLc=1&tLabels=9449309699938393515880';

x(URL, 'body', [{
  label: 'span.label',
  value: 'span.value',
}])(function(err, scraped) {
  console.log(scraped)
})

The above returns:

[ { label: 'Tracking Number: ', value: ' 9449309699938393515880 ' } ]

However in the HTML there's also this:

<span class="label">Updated Delivery Day:</span>
<span class="value">Tuesday, April 28, 2015</span>

and this:

<span class="label">Signed for By: XYZ</span>

Maybe Im not getting how selectors work in xray, cheers

reedo808 commented 9 years ago

I'm very new to using x-ray, but I think you would want to use the following:

var Xray = require('x-ray');
var x = Xray();
var URL = 'https://tools.usps.com/go/TrackConfirmAction.action?tRef=fullpage&tLc=1&tLabels=9449309699938393515880';
x(URL, 'div.tracking-summary', ['span'])(function(err, scraped) {
  console.log(scraped)
})

The results would be:

[ 'Tracking Number: ',
  ' 9449309699938393515880 ',
  'Updated Delivery Day:',
  '\r\n\t\t\t\t\t\t\t\t\t\t\tTuesday, April 28, 2015\r\n\t\t\t\t\t\t\t\t\t\t\t',
  '',
  'Signed for By:\r\n\t\t\t\t\t\t\t\t\t\t\t\t\tS MARSH   //  ANKENY, \r\n\t\t\t\t\t\t\t\t\t\t\t\t\tIA \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t50023 // \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t1:09 pm ' ]
matthewmueller commented 9 years ago

you might need to use x-ray-phantom if the content you're trying to scrape is loaded using javascript

ivansabik commented 9 years ago

good thanks a lot, still wondering why the example I gave doesnt find all labels