matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

problem when Crawl a <table> probably load with an ajax request #184

Closed SamyTahar closed 6 years ago

SamyTahar commented 8 years ago

Subject of the issue

Cannot parse the price on the table where as I can parse some other datas outside. Not sure that it is a bug. I have tried to use the delay() like a wait() with nightmare.

Your environment

var Xray = require('x-ray');
var x = Xray().delay(500);

var prasline_search_URL = "http://www.booking.com/searchresults.fr.html?aid=304142&dcid=1&label=gen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaE2IAQGYAQ24AQfIAQzYAQHoAQH4AQuoAgM&sid=202caf8e7420db9bb8ad4cf882554986&src=country&error_url=http%3A%2F%2Fwww.booking.com%2Fcountry%2Fsc.fr.html%3Faid%3D304142%3Blabel%3Dgen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaE2IAQGYAQ24AQfIAQzYAQHoAQH4AQuoAgM%3Bsid%3D202caf8e7420db9bb8ad4cf882554986%3Bdcid%3D1%3Binac%3D0%26%3B&ss=%C3%8Ele+de+Praslin%2C+Seychelles&checkin_monthday=13&checkin_year_month=2016-6&checkout_monthday=14&checkout_year_month=2016-6&room1=A%2CA&no_rooms=1&group_adults=2&group_children=0&ss_raw=pra&ac_position=0&ac_langcode=fr&dest_id=7089&dest_type=region&ac_pageview_id=6283659fbbea0143&ac_suggestion_list_length=5&ac_suggestion_theme_list_length=0";
var scope = ".sr_item";

 x(prasline_search_URL, scope, [{
    propertyName: ".sr-hotel__name",
    propertyPrice:x('.sr-prc','div.sr-prc--num.sr-prc--final'),
    propertyLocation:"div.address>a",
    propertyRating:".rating>span",
    propertyStarsRating:".stars > span",
     propertyRoomName:"td.roomName > div.roomNameInner > span.room_link"

 }])
 .paginate('ul > li.sr_pagination_item > a@href')
 .limit(1)
 .write('test.json');

Expected behaviour

The price should be extracted

Actual behaviour

The price is not extracted

hemedani commented 8 years ago

That's actually because delay() happens before parsing the URL. I have the same issue with scraping the sites which use AngularJS to retrieve the data. What can we do to have a lite delay after request ?

CasperJS has a several solution for this issue with load.finished & wait.done and so on

SamyTahar commented 8 years ago

Thanks Hemedani, i will check this out