Closed chenyixin-2 closed 5 years ago
Hello,
To my understanding there's no easy way to know if a page is fully loaded or not. That's why I chose the lazy rendering
method which allow good results.
So try to play with the -t
timeout option.
Otherwise:
phantomjs
renderer, play with the webscreenshot.js
filechrome
renderer, unfortunately there's no way to change how chrome
behaves without using a driver like selenium
(or another method), which is currently not implemented in webscreenshot
Cheers.
The -t
timeout option has no effect on this, the default value is 30 seconds already.
I've been experimenting with increasing ajaxTimeout
and maxTimeout
in webscreenshot.js
.
Here is an example with the default values of 400 and 800:
Here is the screenshot after adding 1000 to both values (1400 and 1800):
Can we add an option (-W, --wait
) to pass these values to the python script?
@milangfx, thanks.
When you increased these values, did you encounter more screenshot failures due to the -t
timeout value conflicting with the phantomjs ones (like, more screenshots fail to finish because they wait longer) ?
Do you think 1400 and 1800 could be safely used as default values ?
Good questions. I didn't have any failures, I was not using the -t
timeout option, only changed the values in webscreenshot.js
I think the 1.4 s and 1.8 s wouldn't conflict with the default 30 sec timeout.
I'm not even sure how they relate to each other. I assume the main timeout option (default 30 s) is only relevant to the Chrome and Firefox renderers since PhantomJS has its own settings in webscreenshot.js
.
If I remember correctly, at one point a page didn't fully load even with 1400 and 1800, so a bit higher values might be needed for consistent results, something like 2400 - 2800 (?)
I've only tested this with individual URLs so far. I will check the increased timeouts with a huge list of URLs and compare it to the defaults values.
My only concern is that this could potentially increase the run time a lot if multiple URLs don't load immediately (or before the default 400 - 800). So I'm not sure yet about using 1400 and 1800 as default values.
Good questions. I didn't have any failures, I was not using the
-t
timeout option, only changed the values inwebscreenshot.js
I think the 1.4 s and 1.8 s wouldn't conflict with the default 30 sec timeout. I'm not even sure how they relate to each other. I assume the main timeout option (default 30 s) is only relevant to the Chrome and Firefox renderers since PhantomJS has its own settings inwebscreenshot.js
.
No, the -t
option applies to any renderer: if the renderer reaches that timeout, a SIGKILL is sent to the process.
If I remember correctly, at one point a page didn't fully load even with 1400 and 1800, so a bit higher values might be needed for consistent results, something like 2400 - 2800 (?)
I've only tested this with individual URLs so far. I will check the increased timeouts with a huge list of URLs and compare it to the defaults values.
Yes that would be appreciated, run $ time webscreenshot [options]
and dont hesitate to post execution results.
My only concern is that this could potentially increase the run time a lot if multiple URLs don't load immediately (or before the default 400 - 800).
I think I've already did these kind of tests far in the past, I don't really remember the results but that global increase of duration actually rings a bell to me.
So I'm not sure yet about using 1400 and 1800 as default values.
If the tests show that the global duration is increased, I'll keep the current values but implement an option to handle these parameters and document somewhere that they should be specified in case of partial screenshots.
No, the
-t
option applies to any renderer: if the renderer reaches that timeout, a SIGKILL is sent to the process.
What I meant is that if PhantomJS already stops at the 800 ms maxTimeout
specified in webscreenshot.js
, then the main -t
timeout won't be relevant.
I ran three test on 100 URLs, one with the default timeout values, one with 1000 ms added and one with 1500 ms added.
ajaxTimeout: 400, maxTimeout: 800
python webscreenshot.py -v -i 100URLs 102,07s user 15,79s system 167% cpu 1:10,30 total
40 pages loaded, 60 didn't load
ajaxTimeout: 1400, maxTimeout: 1800
python webscreenshot.py -v -i 100URLs 105,77s user 15,94s system 124% cpu 1:37,95 total
97 pages loaded, 3 didn't load
ajaxTimeout: 1900, maxTimeout: 2300,
python webscreenshot.py -v -i 100URLs 105,79s user 16,72s system 117% cpu 1:44 /2m-15,5s
100 pages loaded
So there's a trade-off between run time and pages actually loading. Having a higher max timeout doesn't affect the pages that would load quickly anyway, but obviously having to wait more for individual pages does add up and results in an overall duration increase.
I'm not sure to read well the figures, the total time is 1m10s (70s)for the first case, 1m37s (97s) for the second and 1m44s (104s) for the third one ? It's only +50% duration increase for more than +100% successful screenshots.
It is worth it, the primary goal of such tool is to perform the maximum number of successful screenshots.
The execution duration is already addressed through multiprocessing and cannot/doesn't have to be more optimized by lowering the number of successful results.
So I might use the 1900/2300 values and offer a user option to specify them.
Cheers.
the total time is 1m10s (70s)for the first case, 1m37s (97s) for the second and 1m44s (104s) for the third one ?
Correct.
It's only +50% duration increase for more than +100% successful screenshots.
Yeah, depends on how you define successful. In my example above, the blank page technically loaded successfully, but there was important content missing since I also wanted the mailing lists to show up so I had to wait a bit longer.
This is just a test with Google Groups, really. Other pages might behave differently.
For example it might be that you have everything important already loaded with the default 400 - 800
timeouts and increasing that would only load more ads on the page. I don't know.
What's important content will always depend on the user. Maybe the user wants the ads to load and see how they are displayed.
If you want to set a higher default, I would go for around ajaxTimeout: 1400, maxTimeout: 1800
. Then let users know in the README how to change it manually in webscreenshot.js
if they don't see the results they want or wire the timeout values to a command line option.
A too high default max timeout can hang the process unnecessarily, e.g. if there's an ad server not responding.
Got it, that's clear.
--ajax-max-timeouts
option added and default values changed in v2.8
Thanks for the quick implementation! Works like a charm.
Thank you for your feedbacks @milangfx
Hi, I am not very familiar with phantomjs and chrome's api. So how should I change the source code to take the screenshot after the webpage is fully-loaded ?