Closed njtaylor closed 1 year ago
Patched the HTTP-Message 6.06 removing this change
https://github.com/libwww-perl/http-message/commit/5aaa8144afba437653c3856e1877af9aa7ff900a
Disallow empty field names or field names containing ':'
We still have the documented hack of prefixing field names with ':' to suppress any case normalizations that would occur.
Working again twitter:card only appears in the contents, should never have ever got into headers, true fault is elsewhere.
Found problem HTTP::Headers is used by HTML::HeadParser so X-Meta-twitter:card is being placed in the list of headers,
Fixes the problem by excluding fields beginning X-Meta - the following patch has been created for OpenBSD ports package.
$OpenBSD$
--- lib/HTTP/Headers.pm.orig Sat Oct 20 10:11:21 2012
+++ lib/HTTP/Headers.pm Tue Nov 6 16:21:11 2012
@@ -150,7 +150,7 @@ sub _header
my($self, $field, $val, $op) = @_;
Carp::croak("Illegal field name '$field'")
- if rindex($field, ':') > 1 || !length($field);
+ if ( $field !~ /^X-Meta/ && rindex($field, ':') > 1) || !length($field);
unless ($field =~ /^:/) {
$field =~ tr/_/-/ if $TRANSLATE_UNDERSCORE;
BBC News articles have meta name="twitter:card" value="summary" in their articles now. When parsing out DCTERMS.modified meta headers from BBC news reports using HTML::HeadParser this bug (Illegal field name 'X-Meta-Twitter:card') is observed.
Fix above (to Headers.pm) confirmed to solve issue.
Stumbled upon this problem too. What's the point of croaking on invalid field names anyway? The web server can produce arbitrary headers, valid or not.
This is a serious problem, that twitter junk is on half the pages I scrape. Why is this still broken after 6 months, especially since somebody provided a patch?
X- prefix means the custom HTTP headers, we need to ignore it, don't we?
Well, the script shouldn't die on invalid HTTP header -- that's for sure. As HTML parser shouldn't die on invalid HTML markup.
Maybe use carp instead of croak?
https://github.com/libwww-perl/http-message/issues/3#issuecomment-10118644 This solution indeed works. I don't get it why authors still let it broken after someone gave a patch 6 months ago...
without this issue report, i was fucked. thanks to everyone.
I meet the problem when i do this : my $browser = LWP::UserAgent->new(); my $response = $browser->get("http://oreilly.com"); print $response->as_string;
thank you for everyone ,espcially njtaylor's code
This bug is also seen using the Chef API.. A required header 'x-ops-sign:version' => '1.0' prevents usage. A slight adjustment of the regex resolved my issue.. Fix below
if ( $field !~ /^X-/ && rindex($field, ':')) > 1 || !length($field);
I think this can be closed – HTTP header field names shouldn't ever include a colon, and all the examples above (which were 9 or so years ago) work fine now. Seems like people have stopped using broken field names?
Thanks, @neilb!
Found with WWW::YouTube::Download, youtube-download only reads first chunk of 4096 bytes. Using HTTP::Message 6.04 the same download would work. Then tried with lwp-download the same thing happened.
With HTTP::Message 6.06
$ lwp-download 'http://www.youtube.com/watch?v=WLu1fi5FeOY&feature=g-all-xit' kk 5.34 KB received line 1.
Illegal field name 'X-Meta-Twitter:card' at /usr/local/libdata/perl5/site_perl/amd64-openbsd/HTML/HeadParser.pm line 207 Transfer aborted. Delete kk? [n] Use of uninitialized value $length in numeric gt (>) at /usr/local/bin/lwp-download line 255,
$ sudo pkg_add -r p5-HTTP-Message p5-HTTP-Message-6.06->6.04: ok Read shared items: ok
$ rm kk $ lwp-download 'http://www.youtube.com/watch?v=WLu1fi5FeOY&feature=g-all-xit' kk 132 KB received
$
$ curl -i -o kkk 'http://www.youtube.com/watch?v=WLu1fi5FeOY&feature=g-all-xit'
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 132k 0 132k 0 0 178k 0 --:--:-- --:--:-- --:--:-- 193k $ grep -i twitter:card kkk
$
This is using libwww-perl 6.04 $ pkg_info | grep libwww p5-LWP-Protocol-https-6.03 adds support for https to p5-libwww p5-libwww-6.04 library for WWW access in Perl $