Closed john-peterson closed 9 years ago
Nice, could you just add these two little changes: (one fixes a warning for me, the other is a show-stopper which prevents generating the docs here) After amending, could you post your suggestion to bug-wget@gnu.org mailing list ? A short explanation + a link to this page should be ok. Most people there don't mess with the Savannah bug tracker.
diff --git a/doc/wget.texi b/doc/wget.texi index 67f74ba..a981fd2 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -1916,7 +1916,7 @@ case. Turn on recursive retrieving. @xref{Recursive Download}, for more details. The default maximum depth is 5.
-@itemx --queue-type=@var{queuetype} +@item --queue-type=@var{queuetype} Specify the queue type (@pxref{Recursive Download}). Accepted values are @samp{fifo} (the default) and @samp{lifo}.
diff --git a/src/init.c b/src/init.c index cd17f98..71b1203 100644 --- a/src/init.c +++ b/src/init.c @@ -1448,7 +1448,7 @@ cmd_spec_recursive (const char _com, const char val, void place_ignored _GLUN / Validate --queue-type and set the choice. */
static bool -cmd_spec_queue_type (const char com, const char val, void place_ignored) +cmd_spec_queue_type (const char com, const char val, void place_ignored _GL_UNUSED) { static const struct decode_item choices[] = { { "fifo", queue_type_fifo },
the other is a show-stopper which prevents generating the docs here
how do i generate docs to detect that error? this command doesnt show any error about that
(cd doc; make)
Normally, this will be done automatically by 'make'. Maybe something is missing on your installation (e.g. pod2man, textinfo, makeinfo) so the creation is skipped ? The error was: wget.texi:1919: @itemx must follow @item Makefile:1346: recipe for target 'wget.info' failed make[2]: *\ [wget.info] Error 1
Normally, this will be done automatically by 'make'.
is there a make type for that? like make docs
? that does something different than (cd doc; make)
Maybe something is missing on your installation (e.g. pod2man, textinfo, makeinfo) so the creation is skipped ?
theres nothing about texinfo in config.log. this is the makeinfo and pod2man output:
configure:38656: checking for makeinfo
configure:38683: result: ${SHELL} /d/repo/wget/build-aux/missing --run makeinfo
configure:38745: checking for pod2man
configure:38763: found /usr/bin/pod2man
configure:38776: result: /usr/bin/pod2man
am i supposed to run that command to get more info?
$ /d/repo/wget/build-aux/missing --run makeinfo
makeinfo: missing file argument.
Try `makeinfo --help' for more information.
makeinfo is 4.13
$ makeinfo --version
makeinfo (GNU texinfo) 4.13
'textinfo' is a typo, should be texinfo ;-) 'cd doc; make clean; make' should output
test -z "wget.dvi wget.pdf wget.ps wget.html" \ || rm -rf wget.dvi wget.pdf wget.ps wget.html test -z "~ .bak .cat .pod" || rm -f ~ .bak .cat .pod rm -rf wget.t2d wget.t2p rm -f vti.tmp oms@blitz-lx:~/src/wget/doc$ make ./texi2pod.pl -D VERSION="1.16.1.36-8238-dirty" ./wget.texi wget.pod /usr/bin/pod2man --center="GNU Wget" --release="GNU Wget 1.16.1.36-8238-dirty" wget.pod > wget.1
So maybe it is this ./texi2pod.pl working different here (or for you) ?
i get the error now. dunno why i didnt get it before. maybe bc i didnt do make clean
(cd doc; make clean; make)
../../doc/wget.texi:1919: @itemx must follow @item
Makefile:1346: recipe for target `../../doc/wget.info' failed
make: *** [../../doc/wget.info] Error 1
After amending, could you post your suggestion to bug-wget@gnu.org mailing list ? A short explanation + a link to this page should be ok.
k email sent
feedback wanted for this patch https://github.com/mirror/wget/pull/1
as I understand your aim, you want Wget behave a bit more like a browser in respect to downloading. This means after downloading the first HTML page, first download non-HTML links (mainly images), second HTML pages.
yes
I don't see a reason why the 'deepness' of those HTML pages should matter when queuing. Since a user doesn't know how deep the link is that he clicks on.
yup. depth no matter
This leads to a queuing without sorting: put the HTML links at the bottom and the non-HTML links to the top. This would lead to a download order that you documented under 'lifo download links directly after its parent page'.
keeping FIFO and enqueue html links last (with sort) isnt enough because all depth n links are still downloaded before any depth n+1 links
FIFO enqueue html last ≠ LIFO enqueue html first
This is not what I said. I said: enqueue html last + enqueue non-html first
This basically the same as having two queues: one for HTML and one for non-HTML. non-HTML working as LIFO, always picked before HTML. If empty, pick from HTML queue (FIFO).
show it with code because i dont understand
the current FIFO code is:
while (1)
// FIFO
url_dequeue
if (descend)
for (; child; child = child->next)
url_enqueue
the LIFO solution is:
while (1)
// LIFO
url_dequeue
if (descend)
// place html pages on top
ll_bubblesort(&child);
for (; child; child = child->next)
url_enqueue
closed in favor of #2
basic problem
the basic problem is that the FIFO queue can create a long time between downloading a page and its links. this is different from the browser experience that the page is designed for. resulting in wget fail that a browser user dont experience
savannah link
this patch is also posted at https://savannah.gnu.org/bugs/?37581
making it optional
k the patch is changed here https://github.com/mirror/wget/pull/1
the patch file is https://github.com/mirror/wget/pull/1.patch
reason to place html pages at the top of the queue
if ll_bubblesort isn't used only the deepest level links are downloaded directly after its parent page despite using LIFO
alternative solution
enqueue child directly after parent seem difficult
another solution is to enqueue the depth n+1 links directly after enqueuing its parent depth n link instead of continuing enqueuing depth n links
this require interrupting the depth n enqueue at html links. dequeue everything (including the html link). enqueue the depth n+1 links. and the continue the depth n enqueue. this require a big reorganization or doesnt make sense
a way to do this could be to store the non-enqueued links in a temporary queue and enqueue them after everything else
the LIFO solution is better than this solution bc
enqueue html last doesnt work
keeping FIFO and enqueue html links last (with sort) doesnt solve the problem because all depth n links are still downloaded before any depth n+1 links
test case description
i dont mean that all resources can be downloaded fast. i just mean that they are downloaded directly after the page that contain them
the example is an image hosting site (imagevenue.com) where all images has its own html page (imagevenue.com/img.php) with a generated image link that expires a while after the html page is generated to prevent links directly to image files
all links can be downloaded with lifo because each branch page has only 1 link in this example and there's more than enough time to download that 1 link if the download begin directly after the link is generated
if a branch page (f.e. imagevenue.com/img.php) had many images (links) there could still be a problem. but the problem would be the same for regular users (browsers) that download the resource directly after the page is loaded and the fault is therefore the site's rather than wget's
test
imagevenue fail
this fails to download the imagevenue.com/img.php images because it's downloading all the img.php pages before the temporary image links in them, and by the time it gets to them they're expired
this downloads images directly after a img.php page is downloaded so they dont have time to expire
invalid input
invalid input is prevented
download order
this test show the FIFO and LIFO download order
i created this local site:
i.html
a.html
a-a.html
a-b.html
b.html
b-a.html
b-b.html
fifo download links long after its parent page. especially the deepest level links
lifo download links directly after its parent page