Closed escritorio-gustavo closed 1 year ago
Notes:
FulfillRequestParams
. This didn't work either. The request still gets sent and still gets stuck.page.goto
, I opened a new page directly on the URL, this does not change the outcome either.This issue seems to be specifically caused by iframe
elements. I have tested it on this codepen, and otained the same result. Loading of the page stops, the maps don't show up, and crashing the rust process gets everything back to normal
Please, if anyone can help, I really need this to work
Not sure yet, but it seems this issue doesn't happen in headless mode
@escritorio-gustavo If I understand your issue correctly, the problem is that there's a small bug that isn't being handled in chromiumoxide, a bug in Chromium itself. You could try forking the repository yourself, and in the file src/handler/target.rs, on line 521, change "auto_attach" to false and let me know if that solves the problem.
@chirok11 This does indeed work, the pages open successfully even in non-headless mode, thank you so much.
Do you have any idea what this chromium bug is and why this auto attach causes it to happen?
Also, does disabling auto attach cause any other side effects (i.e. can I submit a pull request to fix this close this issue without breaking the whole library?)
@escritorio-gustavo
That could be fixed since other libraries like puppeteer or playwright is working correctly within these flags. If you wish to fix it, then take a look for target auto attach handling in other libraries. I saw links to crbug.com yesterday, but it is not yet solved for a while. I am trying to discover issue, and supposing now that chrome is not auto attaching to iframes or service workers and that's why they're stuck and disconnect is solved. And client should attach manually or disconnect from these targets.
@chirok11
Thank you so much, I've been dealing with this issue for a couple of months now (I've been switching between this crate and headless_chrome, which is FAR slower)
I think I'll submit a pull request that adds this fix as a feature using the cfg
macro. Again, thanks a lot
Hey @chirok11, did you close #173 because of #170? If so, let me know and I'll close #170 instead, as it is just a workaround using the method you mentioned with setting auto_attach
to false.
I have absolutely zero knowledge on the subject and I'm willing to bet your PR provides a way better solution
I am still trying to fix it in correct way. I don't have much knowledge too, but trying to do best. I forked repository and has a branch cdp-fx. Could you try to reproduce issue on my branch? I kept auto attaching, but detaching from service workers and added runIfWaitingForDebugger. Most broken sites loading now, but I found an issue with some sites when goto future will always return timeout even website is loaded successfully. Anyway looks like iframes should be handled manually. I am not quite sure that it will not broke work with iframes (but really dont know does this library could work with iframes currently.)
I forked repository and has a branch cdp-fx. Could you try to reproduce issue on my branch?
Sure thing, I'll start testing right now.
In the meantime I'll leave #170 open. Once you figure out how to fix the problem you can simply delete the feature flag I added,
https://github.com/chirok11/chromiumoxide/tree/cdp-fx This is tree. Notice that I also changed Browser::connect and added HandlerConfig argument (to provide custom request timeout)
I've tested a couple of pages that used to give me trouble and things worked fine. Haven't bumped into the goto
problem yet.
Do you have an example URL?
Also it seems like not all iframes will trigger the problem, for instance, in this version of the crate this W3Schools page loads fine, but this CodePen doesn't.
Both pages load with your fork though
Load, and you future is finished? I mean goto
doesn't fall on timeout?
Found the goto
problem on the MDN docs for iframe
This url causes a timeout with both your version and mine, and the unmodified crate has the original problem with the iframe getting stuck and the page never loading
Thanks! Will try to work on it. financialexpress.com also won't load and causes a timeout.
I noticed the references to puppeteer you mentioned in your fork mention service workers being a problem. The MDN page doesn't seem to have any, but financialexpress does. Do you think this could be related?
I think there is something that links them, but it's definitely not Service Worker, as it is indeed not on MDN. There is something common between these two websites, but so far I only see an iframe as a similarity. I have a feeling that with certain website behavior (the presence of an iframe), the logic in CommandFuture breaks, and it simply does not receive a response that the page has loaded (even though it should, as the page is actually loaded correctly).
OK, problem is in frame lifecycle, some websites does not emit "load" event (bug in chromiumoxide event handling? dunno now). If we change expected_lifecycle in src/handler/frame.rs#L617 to "networkIdle" instead of "load" we'll have successfully loading pages without failing them on timeout.
Alright, I've figured it out. It turns out that the issue was, first of all, that check_lifecycle was waiting for a load event from the main frame and all child frames. However, on problematic sites, I encountered child frames with url: None, which will never emit a "load" event. Interestingly, if you increase the wait time to several seconds after creating the page and then navigate to the page, there's a chance that everything will go smoothly; so, the problem is intermittent. But if you remove the pause between creating the page and navigating, almost a hundred percent of the time there will be a timeout error. I've slightly modified the check_lifecycle function in src/handler/frame.rs; I've added the following conditions: lifecycle_events contains "load", or if frame.url is None, at least check for the presence of DOMContentLoaded.
I think there should be an option to choose whether to wait for "load," "networkIdle," or "networkAlmostIdle," similar to other libraries working with the Chrome DevTools Protocol.
@escritorio-gustavo You could try to check MDN links or other and confirm that it is not failing by Timeout for now and other pages doesn't get broken.
@chirok11
This worked, the newest version of your fork properly loads the MDN page, financialexpress and the pages where I originally encountered the problem, I will close #170 in favor of this implementation
Hey @chirok11, can you add "this will fix #163" to your PR's description to link it to this issue? This way when it's accepted the issue should be automatically closed
I tried to do it as a comment but Github will only create the link if it's on the PR description or on a commit message
@escritorio-gustavo done.
Awesome! You did an awesome job! Thanks again for all the help with this issue
Btw, the issue #171 wasn't linked to the PR, as it requires the closes
keyword in all issues to create the links, so you need to write "Closes #163 and closes #171"
Thank you, modified comment.
I am trying to visit a page that uses Google's RecaptchaV3. The issue I'm having is the following: for some reason, the equest sent by the
src
attribute of the<iframe>
element gets stuck on pending, on the content download phase, causingpage.goto
to fail.If, for any reason, the rust code panics and the browser survives as a zombie process, everything imediately goes back to normal, i.e. the request resolves as soon as Rust crashes.
Below are screenshots of the problem and a minimal reproduction of the code I'm using:
The weird thing about it is that it does have a 200 HTTP status code, even though it's still pending