Change the way errors are handled in Lua scripts

kmike commented 9 years ago

In http://splash.readthedocs.org/en/1.3.1/scripting-tutorial.html#error-handling it is described how are errors handled in Splash Lua scripts:

for developer errors (e.g. incorrect function arguments) exception is raised;

for errors outside developer control (e.g. a non-responding remote website) status flag is returned: functions that can fail return ok, reason pairs which developer can either handle or ignore.

This means that by default script won't stop on "external" errors; to make code robust almost every statement needs to be wrapped in assert():

assert(splash:go(splash.args.url))
assert(splash:wait(0.5))

Even in the documentation I was sometimes sloppy and haven't written these asserts. One example is wait_for from http://splash.readthedocs.org/en/latest/scripting-tutorial.html#writing-modules:

function Splash:wait_for(condition)
    while not condition() do   -- condition function can't follow our conventions here
        self:wait(0.05)        -- assert is missing
    end
end

It is too easy to forget about error checking, and code that checks for errors looks uglier than code which doesn't check.

I propose to change the error handling convention and raise an error both for developer errors and for error conditions outside developer control (e.g. 404 pages).

Pros:

errors are not silently discarded;
less code to write and read in common cases (?);
it is possible to wrap several statements in pcall and handle errors once.

Cons:

not ideomatic Lua;
handling errors with pcall requires more work;
pcall may catch developer errors (e.g. invalid argument names)

TODO: figure out how code using pcall will look like:

-- option 1, no-brainer - does it work?
local ok, res = pcall(splash:evaljs, "document.title")

-- option 2, ugly
local ok, res = pcall(splash.evaljs, self, "document.title")

-- option 3, verbose
local ok, res = pcall(function() 
    return splash:evaljs("document.title")
end)

Any thoughts?

pawelmhm commented 9 years ago

+1

This is very important IMO. I just realized in some situations we take screenshots of 500 and 404 pages. Script like this

function main(splash)
            splash:go("http://httpbin.org/status/500")
            return splash:png()
end

returns perfectly nice 200 response from splash, and client gets screenshot of 500 without anyone noticing this.

I propose to change the error handling convention and raise an error both for developer errors and for error conditions outside developer control (e.g. 404 pages).

This sounds like a good idea for me, but just to clarify, which response codes are we talking about here? If splash makes 60 requests while rendering page, and 1-2 return 404 will error be raised? Or are we talking only about response codes for first initial request?

One thing about:

for errors outside developer control (e.g. a non-responding remote website) status flag is returned: functions that can fail return ok, reason pairs which developer can either handle or ignore.

currently ok, reason pair is very generic. If remote website responds with 400 clients gets nil, 'error' pair, and it gets same thing when remote website responds with 500.

Is it possible to get more specific error messages? For example status code of response and message? Does it make sense? I know there are many different requests involved when rendering, so probably it would be difficult to speak about one response status code, but perhaps we have some data available that could help the client to know why splash request failed?

Another thing that seems important to me. What will be response code of splash in case remote resource responds with 500? IMO splash response code should allow to distinguish this failure from failure of splash operation, perhaps 503 with error message propagated from remote resource would be good?

kmike commented 9 years ago

Hey @pawelmhm,

... returns perfectly nice 200 response from splash, and client gets screenshot of 500 without anyone noticing this.

yes, this is the main motivation behind this proposal. You need to be careful and write assert(splash:go(...)) each time, or handle it explicitly by using ok, reason = splash:go(...).

This sounds like a good idea for me, but just to clarify, which response codes are we talking about here? If splash makes 60 requests while rendering page, and 1-2 return 404 will error be raised? Or are we talking only about response codes for first initial request?

If you're talking about splash:go, then error is raised in case of 4xx or 5xx response for the request which contents is displayed in the browser window. In most cases it is the first request, but it could be not the first (e.g. a page can do a redirect to a 4xx or 5xx resource after some time). It may happen that there is no response at all, e.g. if a remote resource is not availbale because of DNS error or network error or because a remote server dropped connection without returning anything - this is also an error.

Other functions can return an error in other cases; they all should be documented in Splash docs.

currently ok, reason pair is very generic. If remote website responds with 400 clients gets nil, 'error' pair, and it gets same thing when remote website responds with 500.

This is not how Splash is designed to work; if you see this behaviour it is a bug. Error is not always 'error', it should be possible to distinguish different error types: for 400 response it should be 'http400', for 503 response it should be 'http503', etc. This is documented and tested.

Another thing that seems important to me. What will be response code of splash in case remote resource responds with 500? IMO splash response code should allow to distinguish this failure from failure of splash operation, perhaps 503 with error message propagated from remote resource would be good?

Splash allows to distinguish between various error types. Currently it doesn't allow to set a resulting HTTP status code error code, but you can return {'status': ..., result: ...} instead of just returning the result.

To clarify: by changing how errors are handling we'll change the style in which Splash scripts are written; it doesn't change anything else. Handling errors properly should be already possible; it is just easy to forget to check for errors, and Splash silently ignores them if you don't handle them.

By changing errors from ok, reason style to exceptions style it'll be harder to miss error checks. But https://github.com/scrapinghub/splash/issues/165 is a blocker here.

pawelmhm commented 9 years ago

This is not how Splash is designed to work; if you see this behaviour it is a bug.

aha, so I think I found a bug here, I tested this in splash jupyter with recent splash version:

In [1]: splash:go("http://httpbin.org/status/400")
Out[1]:
nil, "error"
In [2]: splash:go("http://httpbin.org/status/500")
Out[2]:
nil, "error"

Currently it doesn't allow to set a resulting HTTP status code error code, but you can return {'status': ..., result: ...} instead of just returning the result.

hmm this sounds interesting, but let's say I'm returning data in .png by default, so I can't return json easily without having to update clients, spiders would have to check content-type which would be a bit confusing.

To clarify: by changing how errors are handling we'll change the style in which Splash scripts are written; it doesn't change anything else. Handling errors properly should be already possible; it is just easy to forget to check for errors, and Splash silently ignores them if you don't handle them.

ok I understand, but is it okay that splash will return same status code for errors in lua scripts (let's say syntax errors) and errors from remote website (correct script but target website responds with 500)? These are different situations. In case of errors in lua script (incorrect, arguments, syntax etc) this is evidently client fault - and 400 bad request is appropriate. If script is totally fine but website responds with 500 it means that target is down and client should retry, so it would be nice to get 5\ status code from splash.

kmike commented 9 years ago

aha, so I think I found a bug here

Thanks, I'll check it.

hmm this sounds interesting, but let's say I'm returning data in .png by default, so I can't return json easily without having to update clients, spiders would have to check content-type which would be a bit confusing.

A simpler way to write a client would be to always consume JSON. But I'm fine with allowing to set HTTP status code from the Splash script, similar to how it allows to set Content-Type (http://splash.readthedocs.org/en/latest/scripting-ref.html#splash-set-result-content-type). Returning JSON is more general and doesn't require using non-200 HTTP status codes, but in some specific cases (like returning PNG images directly, without encoding them to base64/json) it may be helpful.

ok I understand, but is it okay that splash will return same status code for errors in lua scripts (let's say syntax errors) and errors from remote website (correct script but target website responds with 500)?

Yes, I think it is OK. Splash will return an error not because the remote website returns an error, but because user didn't handle this case, so it should use the same 400 status code. Error message indicates which kind of error happened (see https://github.com/scrapinghub/splash/issues/226 though). When users handle the error (by using pcall) they are free to return anything or set any appropriate http error code, if your suggestion about splash:set_result_status_code gets implemented.

Possible errors are not limited to splash:go method (e.g. splash:wait or splash:evaljs can also raise an error), not all of them map to HTTP status codes cleanly. For example, should we return 504 response if splash:wait_for_resume timeouts? If we do, it will become harder to distinguish these errors from 504 errors in remote responses. We may return 400 in all cases the error is not caused by a remote resource, but what if remote website returned 400 response? For me it looks like auto-mapping exceptions to HTTP status codes is a wrong approach - there are tricky edge cases, and it is not easier to work with than alternatives.

See how Python does it: in case of unhandled exception status code is always set to 1, regardless of the exception. If you want to return anything else you need to catch the exception and use sys.exit(..) with an exit code you want.

kmike commented 9 years ago

@pawelmhm an error with HTTP errors detection should be fixed by #228 - thanks for catching it!

pawelmhm commented 9 years ago

when working on https://github.com/scrapinghub/splash/issues/231 I noticed that errors resulting from calls to undefined Lua methods/attributes in Splash callbacks are not propagated to the client, eg Lua like this with typo in set_url will return 200 response:

function main(splash)
    splash:on_request(function(request)
        if request.url == splash.args.url then
            request:set_urlx(splash.args.new_url)
        end
    end)
    splash:go(splash.args.url)
    return splash:html()
end

should it raise 400?

kmike commented 9 years ago

@pawelmhm see https://github.com/scrapinghub/splash/issues/208. I think that the error should be propogated indeed.

scrapinghub / splash

Change the way errors are handled in Lua scripts #161