jnthnclrk / warrick

Automatically exported from code.google.com/p/warrick
0 stars 0 forks source link

zero length content "No Content in ..." #29

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. ./warrick.pl -dr 2013-08-05 -d -a ia -D ../ftp/ http://www.atlantischild.hu/

What is the expected output? What do you see instead?

http://wayback.archive.org/web/20111031230326/http://www.atlantischild.hu/index.
php?option=com_content&task=view&id=21&Itemid=9
has non-zero lenght, I get zero lenght files:
"index.php?option=com_content&task=view&id=21&Itemid=9"

What version of the product are you using? On what operating system?
warrickv2-2-5

Please provide any additional information below.

I've got a non-zero lenght file which has GET parameters in its name
but all files containing & (ampersand) in their names are empty.

log says (below)
as you see, nothig anfter "?" in "To stats ... Location:"

-------
At Frontier location 79 of 769
-------

My frontier at 79: 
http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&
Itemid=28
My memento to get: 
|http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21
&Itemid=28|

targetpath: index.php

appending query string option=com_content&task=blogcategory&id=21&Itemid=28

 mcurling: /home/davidprog/dev/design-check/atlantis/warrick//mcurl.pl -D "/home/davidprog/dev/design-check/atlantis/warrick/../ftp//logfile.o"  -dt "Sun, 04 Aug 2013 22:00:00 GMT"  -tg "http://web.archive.org/web" -L -o "/home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28" "http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&Itemid=28"

Reading logfile: 
/home/davidprog/dev/design-check/atlantis/warrick/../ftp//logfile.o

To stats 
http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&
Itemid=28 => Location: 
http://web.archive.org/web/20120903050228/http://www.atlantischild.hu/index.php?
 => 
/home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_c
ontent&task=blogcategory&id=21&Itemid=28 --> stat IA

returning 
/home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_c
ontent&task=blogcategory&id=21&Itemid=28
Search HTML resource 
/home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_c
ontent&task=blogcategory&id=21&Itemid=28 for links to other missing resources...
No Content in 
/home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_c
ontent&task=blogcategory&id=21&Itemid=28!!

Original issue reported on code.google.com by szepe.vi...@gmail.com on 31 Aug 2013 at 4:42

GoogleCodeExporter commented 8 years ago
This is caused is a simple escaping bug in mcurl.pl and MementoThread.pm that 
can be fixed with a patch as follows:

~/t2/warrick2$ diff -u ../../warrick2/mcurl.pl mcurl.pl
--- ../../warrick2/mcurl.pl     2014-02-05 16:35:37.362518862 -0800
+++ mcurl.pl    2012-03-27 13:02:41.000000000 -0700
@@ -95,10 +95,7 @@

 for (my $i = 0; $i <= $#ARGV; ++$i)    #
 {
-    if ( ( index($ARGV[$i] , ' ') > -1 )
-       or ( index($ARGV[$i] , '?') > -1 )
-       or ( index($ARGV[$i] , '*') > -1 )
-       ) {
+    if ( index($ARGV[$i] , ' ') > -1 ){
 $ARGV[$i] = '"' .$ARGV[$i] . '"';
     }
 }
~/t2/warrick2$ diff -u ../../warrick2/MementoThread.pm MementoThread.pm
--- ../../warrick2/MementoThread.pm     2014-02-05 16:38:19.914518843 -0800
+++ MementoThread.pm    2012-03-27 13:02:42.000000000 -0700
@@ -97,7 +97,7 @@
         $acceptDateTimeHeader = " -H \"Accept-Datetime: ".$self->{DateTime}." \" ";
     }

-    my $command = "curl -I $acceptDateTimeHeader  \"$self->{URI}\" ";
+    my $command = "curl -I $acceptDateTimeHeader  $self->{URI} ";
     if($self->{Debug} == 1){
         print "DEBUG: " .$command ."\n";
     }
@@ -351,7 +351,7 @@

     } else {

-       $command = "curl @params $acceptDateTimeHeader \"". $self->{TimeGate} 
."/" . $self->{URI} . "\"";
+       $command = "curl @params $acceptDateTimeHeader ". $self->{TimeGate} 
."/" . $self->{URI};

     }

@@ -390,7 +390,7 @@

                 $command = "curl -I -L $acceptDateTimeHeader ". $self->{Info}->{TimeGate} ;
             } else {
-                $command = "curl -I -L $acceptDateTimeHeader \"". 
$self->{TimeGate} ."/" . $self->{URI} . "\"";
+                $command = "curl -I -L $acceptDateTimeHeader ". 
$self->{TimeGate} ."/" . $self->{URI};

             }

@@ -667,4 +667,4 @@
     return $result;
 }

Original comment by b4r...@gmail.com on 6 Feb 2014 at 12:46

GoogleCodeExporter commented 8 years ago
Thank you!!

Original comment by szepe.vi...@gmail.com on 6 Feb 2014 at 10:47