Open rkennedy-mode opened 1 year ago
It may be worth considering a more reactive model using Future/CompletableFuture and a dedicated event processing thread in Playwright to help avoid generating stacks of such depth.
We considered that and decided to proceed with the current approach as all the alternative seemed way less ergonomical. It is true that with the current message dispatch model you can end up with stack overflow when you perform API calls that may lead to interaction with the driver. In all of the cases we've seen so far there were workarounds. In general we recommend using waitFor methods instead of on listeners as they are not susceptible to this problem. E.g. prefer waitForRequest to onRequest.
For what it's worth, this would also make it much clearer to callers what data they're fetching that can be returned locally and what requires a full round trip back to the Playwright/browser process. That's often unclear, potentially leading to people making calls that are far more expensive and less reliable than they realize.
I'd say each of the action methods in Playwright API can may require interaction with the driver, so it'd be save to assume that any such method always talks to the driver. Most of the time (except for the cases with recursive event dispatch) you should not worry about it.
In general we recommend using waitFor methods instead of on listeners as they are not susceptible to this problem. E.g. prefer waitForRequest to onRequest.
This very naively assumes that:
In our application, we add onRequestFailed/Finished callbacks so we can instrument outbound calls. We don't actually know if any are going to be made, let alone how many will be made. We're loading a URL provided to us by an outsider. We also have onConsole callbacks, onPageError, onPageCrash, and so on. That's just for what we want to instrument so we can go back and see what the page was doing in the event we need to debug some issues later on.
That then frees us up to use the newly added waitForCondition method for the page to reach a "ready" state, which is when we begin taking screenshots and generating PDFs. We utilize waitForCondition because we manage our own timeout external to Playwright. That timeout incorporates signals from bindings we've created in the page so that the page can signal "progress" towards readiness.
In short, what you suggest sounds nice and simple. But our use case does not fit into that pattern. As I mentioned before, if it's helpful I'm more than happy to jump on a video call to explain and demonstrate what we're doing. It's not web testing, as you may have gathered. So maybe this isn't necessarily something Playwright wants to invest time in helping out with, which I would understand. But if it's valuable to more folks than just us, I'd be more than happy to get into greater detail.
In short, what you suggest sounds nice and simple. But our use case does not fit into that pattern. As I mentioned before, if it's helpful I'm more than happy to jump on a video call to explain and demonstrate what we're doing. It's not web testing, as you may have gathered. So maybe this isn't necessarily something Playwright wants to invest time in helping out with, which I would understand. But if it's valuable to more folks than just us, I'd be more than happy to get into greater detail.
This all makes sense, thanks for providing more details! I understand that in scraping and other use cases this may be more of an issue as the web apps behavior is less predictable, whereas in testing scenarios one usually knows what requests the app may/is expected to send. Our primary focus has been web testing so far.
I am having similar issues when running tests.
Basically I have a test that has some setup it has to do, it kicks off some actions on the page, and then wants to continue only after a request has been made to a specific url.
What I have ended up with is setting up an onRequest listener to record when a request is made to that url, and that listener delivers to a clojure promise (a countdown latch, a read from it blocks until it has been deliver to, similar to some uses of CompletableFuture). Then I expect to be able to wait on the promise until the callback fires, but that is not the case, I have to loop alternating between calling waitForTimeout (to process events) and checking the promise (to see if the callback fired).
This took me a bit to figure out because I assumed if I registered callbacks they would just get called when events come in. I can see how this might make things somewhat more deterministic for tests, which is great, but the ux on a platform with good native multithreading like the jvm is weird.
I should say I am not using playwright's builtin assertion stuff, because this is part of an existing suite of tests, and maybe if I was using those they would do the event loop pumping required.
unless I am completely off base advice like "prefer waitForRequest to onRequest" actually introduces race conditions:
if you consider something like:
if you do 1, and then wait for 2, the browser may have done 2 before you called the waitForRequest method.
will a callback approach you can register the callback before doing 1 so that it will be in place whenever 2 happens.
if you do 1, and then wait for 2, the browser may have done 2 before you called the waitForRequest method.
This is why you click inside the callback passed to waitForRequest, see its usage examples in the docs.
ah, sorry, my mistake, I was looking at some wait* method in Frame.java and assumed waitForRequest was the same.
looking at Frame.java it looks like the example usage for waitForUrl is the above race condition.
https://playwright.dev/java/docs/api/class-frame#frame-wait-for-url
This is somewhere between a feature request and a bug. My production system recently threw a StackOverflowException.
Upon further inspection I've determined that the following is happening:
The issue here is that the stack can't begin to unwind until the most recent Request.response() is able to return, even if every prior Request.response() is available.
It may be worth considering a more reactive model using Future/CompletableFuture and a dedicated event processing thread in Playwright to help avoid generating stacks of such depth.
For what it's worth, this would also make it much clearer to callers what data they're fetching that can be returned locally and what requires a full round trip back to the Playwright/browser process. That's often unclear, potentially leading to people making calls that are far more expensive and less reliable than they realize.