scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.1k stars 512 forks source link

Incomplete rendering and unresponsive page elements #583

Open schugh opened 7 years ago

schugh commented 7 years ago

OS: MacOS Sierra 10.12.3 running Docker for Mac Docker Container: Official scrapinghub/splash:latest

I'm trying to scrape booking information starting at a hotel page on agoda.com.

What I'm trying to achieve

  1. Navigate to the link: https://www.agoda.com/beach-club-koh-tao/hotel/koh-tao-th.html?checkIn=2017-6-26&los=2&Rooms=1&Adults=2&Childs=0&currency=USD&origin=US&cid=1752739&gclid=-1&_ga=1.171749286.702978769.1466441509http://google.com

  2. Select the 'Rooms' drop down for the first room and set it to '2'.

  3. Click the Book Now button for that room.

Right off the bat, when you render using localhost:8050/render.html?url=<url in step 1> you'll notice that there are handful of missing elements. Still, this isn't a show stopper because the elements I care about are rendering. So when I continued on to write the LUA script to execute this sequence of actions.

Here's the LUA script I use to attempt to execute the flow above:

function main(splash) local url = splash.args.url splash:set_user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36") assert(splash:go(url)) assert(splash:wait(1)) splash:set_viewport_full() splash:wait(0.5) local room_condition_id = "room-38D66B84A005C9A00ABA58FC59F3C1112" local select = assert(splash:select('#' .. room_condition_id .. ' .select-room-quantity')) select:send_keys("<2> <Return>") splash:wait(0.5) local btn = assert(splash:select('#' .. room_condition_id .. ' .book-button')) btn:mouse_click(10, 10) splash:wait(5) return { png = splash:png(), html = splash:html() } end

The Result

The result of this should be navigation to a different page with the booking information that I need to further scrape. Instead, the click doesn't perform any action. The manipulation of the select drop down works just fine.

Resulting rendered image (I cropped it to show the portion of interest): image

You'll notice that the 'Book Now' button for the first record where the 'Rooms' drop down is showing the correct value of '2', is light blue. This seems to be the hover style for the button so I'm pretty sure that the mouse pointer is getting to the right spot. It just isn't triggering the javascript redirect.

Things I've tried

To wrap up

I've also attempted to do something similar with hotels.com and the symptoms are similar: Clicks aren't executing. In the case of hotels.com, I'm able to advance because the book button is in a form and we can send the enter key to trigger a submit.

When I try to perform this action using PhantomJS, it just works. I did notice that when I take a PNG snapshot in PhantomJS, the entire page renders correctly before and after I proceed through my actions. Based on this, I expect that this is a rendering engine issue.

Help?

Gallaecio commented 4 years ago

Did you manage to solve your issue?