rubycdp / ferrum

Headless Chrome Ruby API
https://ferrum.rubycdp.com
MIT License
1.71k stars 123 forks source link

Way to resolve promise in page javascript? #234

Closed hkmaly closed 2 years ago

hkmaly commented 2 years ago

Could you add some way to resolve promise? evaluate('await something_returning_promise()') obviously doesn't work. evaluate_async('something_returning_promise()') doesn't work either (less obviously, returns timeout error). Tried to add method

def evaluate_await(expression, *args)
    expression = "function() { return %s }" % expression
    call(expression: expression, arguments: args, awaitPromise: true)
end

but, while it seems to do something, it returns {}. Also, monkey patching it in is quite complicated.

hkmaly commented 2 years ago

Wait, that was my mistake INSIDE the javascript. The evaluate_await works.

route commented 2 years ago

https://github.com/rubycdp/ferrum/blob/master/lib/ferrum/frame/runtime.rb#L53

hkmaly commented 2 years ago

@route I explicitly said that evaluate_async is not working, always timeouts. Or is there some trick how it's supposed to be used?

ttilberg commented 2 years ago

@hkmaly What is your example? You haven’t provided much information about your original issue for route to consider.

hkmaly commented 2 years ago

@ttilberg @route Sorry, I though it's obvious. I have following download method:

def download_fallback(browser, url, name, mode)
    return(false) if File.exists?(name)
    # https://stackoverflow.com/questions/44698967/requesting-blob-images-and-transforming-to-base64-with-fetch-api/5046
3054#50463054
    js = <<~ZATREOT
fetch('#{url}').then(response => response.blob()).then(blob => new Promise(callback => {
    let reader = new FileReader();
    reader.onload = function() {callback(this.result)};
    reader.readAsDataURL(blob);}
))
ZATREOT
    if mode
        data = browser.evaluate_await(js);
    else
        data = browser.evaluate_async(js, 5);
    end
    dp = data.split(',')
    return(false) unless dp.count == 2
    data = Base64.decode64(dp[1])
    File.open(name, 'w') {|fd|
        fd.write data
    }
    puts "\tSaved #{name} from #{url}"
    return(true)
rescue => ex
    puts "\tError download #{url}: #{ex.class}: #{ex.message}\n\t#{ex.backtrace.join("\n\t")}"
    return(false)
end

In "true" mode, using my own evaluate_await, it works and saves the image from provided url. In "false" mode, using evaluate_async, throws:

    Error download (url redacted): Ferrum::ScriptTimeoutError: Timed out waiting for evaluated script to return a value
    /var/lib/gems/2.5.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:155:in `handle_error'
    /var/lib/gems/2.5.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:141:in `block in call'
    /var/lib/gems/2.5.0/gems/ferrum-0.11/lib/ferrum.rb:145:in `with_attempts'
    /var/lib/gems/2.5.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:124:in `call'
    /var/lib/gems/2.5.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:70:in `evaluate_async'
    ... :in `download_fallback'

And note that it DEFINITELY doesn't take 5 seconds when it works.

hkmaly commented 2 years ago

@ttilberg @route Note that I suspect that my evaluate_await is technically wrong - for start, it doesn't have the timeout - but your evaluate_async is based on some deep undocumented magic around the arguments. Providing method which would SIMPLY expect the expression to return (end with) promise object would make more sense.

ttilberg commented 2 years ago

I think this is good feedback. If nothing else the example in the readme should clarify a little bit about what's going on, and ideally include an example with a promise. The arguments array receives the ruby arguments after the timeout:

The first move in the evaluate_async wrapper is to add the resolve callback to the end of the arguments array, so now we have:

You end up with a function body that looks like this:

(function() {
  return new Promise((__f, __r) => {
    try {
      arguments[arguments.length] = r => __f(r);
      arguments.length = arguments.length + 1;
      setTimeout(() => __r(new Error("timed out promise")), 5000);
      console.dir(arguments)
arguments[0](fetch('https://httpbin.org/json').then(response => response.json()))

    } catch(error) {
      __r(error);
    }
  });
}
)

In order to receive the promise's final value on the Ruby side, we need to hit that wrapping promise callback. The way it's being done in the example on the readme is through arguments[0](my_value):

b = Ferrum::Browser.new(headless: false)
my_json = b.evaluate_async <<~JS, 5
  console.dir(arguments)
  arguments[0](fetch('https://httpbin.org/json').then(response => response.json()))
JS

You can also successfully receive the value by using the templated resolve function name:

my_json = b.evaluate_async <<~JS, 5
  console.dir(arguments)
  __f(fetch('https://httpbin.org/json').then(response => response.json()))
JS

Of course, you can call that callback any time:

my_json = b.evaluate_async <<~JS, 5
  console.dir(arguments)
  fetch('https://httpbin.org/json').then(response => response.json()).then(result => __f(result))
JS

I'd be curious to learn more background on why the success callback is being added to the arguments list, why that version is used in the example, and why the template doesn't use a more friendly name than __f which makes me think of __f(ail) (appreciating the notion that we're trying to not step on dev's toes by prefixing with __).

hkmaly commented 2 years ago

Thanks, will try. Frankly I suspect the use of arguments and __f is deliberate attempt to confuse it more, it definitely works that way.

Yes, maybe it could be solved just by better documentation, but I still wonder if adding explicit function for this purpose wouldn't be clearer. I mean, I would assume that wanting to wait specifically on promise is quite common. Even if that function would be just

def evaluate_await(expression, wait, *args)
    evaluate_async(expression + '.then(result => __f(result))', wait, *args)
end

(hmmm ... or evaluate_promise?)

decaffeinatedio commented 2 years ago

@ttilberg Sorry to ping on a closed issue, but I was having a super hard time figuring out how to get evaluate_async working until I came across your helpful example/explanation. I'm happy to put in a PR on the readme to outline how to use __f() if that would be helpful!

route commented 2 years ago

@decaffeinatedio any doc improvement is appreciated