scrapy-plugins / scrapy-splash

Scrapy+Splash for JavaScript integration
BSD 3-Clause "New" or "Revised" License
3.15k stars 450 forks source link

Use splash:mouse_click #106

Closed crisfan closed 4 years ago

crisfan commented 7 years ago

I tried using splash: mouseclick to load the next page and get the code for the next page of the page.

function process_one(splash)
    local get_dimensions = splash:jsfunc([[
    function () {
          var allA = document.getElementsByTagName('a');
          for(var i=0;i<allA.length;i++){
            if(allA[i].innerHTML=="\u4e0b\u4e00\u9875"){
                var rect = allA[i].getClientRects()[0];
                return {"x": rect.left, "y": rect.top};
        }
    }
    }
    ]])
   splash:set_viewport_full()
   splash:wait(0.1)
   local dimensions = get_dimensions()
   splash:mouse_click(dimensions.x, dimensions.y)
   splash:wait(5)
   local content=splash:html()
   return content
end

function process_mul(splash)
   local res={}
   for i=1,3,1 do
       res[i]=process_one(splash)
   end
   return res
end

function main(splash)
   assert(splash:go("http://was.mot.gov.cn:8080/govsearch/gov_list.jsp"))
   return {res=process_mul(splash)}
end

The above code can work properly, but the efficiency is too low,I have to use splash: wait to wait 5 seconds to ensure that the page load is completed, otherwise I will get a lot of duplicate page code.I have read the information for a long time but did not find an efficient way to deal with this problem.

Is there any way in splash that has a method like selenium implicitlyWait or is there an easier way to fix my problem?

kmike commented 7 years ago

Hey @ForkEyes,

There is no inplicitlyWait in Splash (yet? it sounds like an interesting idea), but you can do it explicitly, e.g.

function main(splash)
  splash:set_user_agent(splash.args.ua)
  assert(splash:go(splash.args.url))

  -- requires Splash 2.3
  -- todo: use splash:with_timeout here,
  -- to limit total wait time
  while not splash:select('.my-element') do
    splash:wait(0.1)
  end
  splash:select('.my-element'):mouse_click()
  splash:wait(0.5)  -- todo: wait for another element
  return {html=splash:html()}
end

I think adding a helper function like wait_for_element to Splash itself is a good idea (just opened https://github.com/scrapinghub/splash/issues/569 for it).

Gallaecio commented 5 years ago

https://github.com/scrapinghub/splash/issues/569 covers the feature and https://github.com/scrapinghub/splash/pull/829 documenting the best current solution. @crisfan Can we close this issue?