scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.09k stars 513 forks source link

Very slow serialization of js function result #949

Open lopuhin opened 5 years ago

lopuhin commented 5 years ago

Consider the script

function main(splash, args)
  f = splash:jsfunc([[
  function () {
    var result = []
    for (var i = 0; i < 5000; i++) {
      result.push({
         i_value: [i],
         const_value: 'foo',
      })
    }
    return result; // JSON.stringify(result);
  }  
  ]])
  s0 = splash:get_perf_stats()
  result = f()
  s1 = splash:get_perf_stats() 
  return {
    time=s1.walltime - s0.walltime,
    length=#result,
  }
end

this takes around 0.20 s. If we change the value to const_value: ['foo'], then time goes up to 0.25 -- 0.30 s. If we return result as a JSON string instead (commented out) then it takes 0.05 -- 0.07 s in both cases. So it looks like passing a big nested value from JS to the script is quite slow.

kmike commented 5 years ago

Yeah, that's slow.. The reason is probably this code: https://github.com/scrapinghub/splash/blob/a1f44885affca5d5460e6c773914a5f4dc2d0e13/splash/jsutils.py#L9

lopuhin commented 5 years ago

Oh I see, makes sense

kmike commented 5 years ago

I'm not sure what do do about it. Options:

  1. Maybe it should be a documentation issue, e.g. a FAQ entry, with links from Lua API docs. FAQ entry would explain the problem and suggest a fix.
  2. Alternatively, there can be a Splash startup option which disables protection.
  3. Third option is to try to improve the speed.
lopuhin commented 5 years ago

I think option 1 is good, another variant would be adding this to the jsfunc docs.

kmike commented 5 years ago

It is for jsfunc, evaljs, wait_for_resume, maybe some other functions, so I think FAQ + links from individual functions in Lua API is good