xaicron / p5-www-youtube-download

YouTube video download interface.
http://blog.livedoor.jp/xaicron/
Other
38 stars 28 forks source link

garbage after JSON object #27

Closed java4fun closed 9 years ago

java4fun commented 10 years ago

I run eg/cb.pl, getting the following error. I use cygwin/perl-5.14

garbage after JSON object, at character offset 15409 (before ";(function() {var en...") at F:/p5-www-youtube-download/p5-www-youtube-download/lib/WWW/YouTube/Download.pm line 229.

profplump commented 10 years ago

Fixed with a semicolon in the regex at line 225: elsif ($line =~ /^.+ytplayer.config\s=\s({.*});/) {

profplump commented 10 years ago

After doing that it couldn't parse the signature, but I'm not sure that's necessary anymore -- the URL seems to contain the signature already, so I just disabled that code at line 302:

my $sig = $query->{sig} || _getsig($query->{s});

    my $url = $query->{url};
    #$fmt_url_map->{$query->{itag}} = $url.'&signature='.$sig;
    $fmt_url_map->{$query->{itag}} = $url;
grtodd commented 10 years ago

These "fixes" or "workarounds" work for me as well. Good catch! Not all missing ;'s are the fault of perl :-) I'm not sure where to ask this, but do you have any general tips or tricks on how to debug these perl/js/regex scraping sorts of issues? Cheers,

profplump commented 10 years ago

The JSON error message more or less tells you the problem -- it doesn't like the input string starting at the semi-colon. So I start by printing out that section of the string (or the whole string if it's not too long) with something like:

print STDERR substr($str, 15409 - 25, 50);

After that it's often it's obvious why the parsing failed, and you can adjust the regex accordingly.

profplump commented 10 years ago

I haven't tested it extensively yet, but to deal with changes at YT this last week I have again modified the regex to: elsif ($line =~ /^.+ytplayer.config\s=\s({.*});ytplayer./) { which at least so far has worked for me. If you find a video it doesn't work for post the YT ID and I'll see if I can make it go.

tsibley commented 10 years ago

The easiest way to fix the "garbage after JSON" is to let the JSON parser just stop parsing at that point:

diff --git a/lib/WWW/YouTube/Download.pm b/lib/WWW/YouTube/Download.pm
index 36c9a9c..5549507 100644
--- a/lib/WWW/YouTube/Download.pm
+++ b/lib/WWW/YouTube/Download.pm
@@ -226,13 +226,15 @@ sub _get_args {
             croak 'Video not available in your country';
         }
         elsif ($line =~ /^.+ytplayer\.config\s*=\s*({.*})/) {
-            $data = JSON->new->utf8(1)->decode($1);
+            ($data, undef) = JSON->new->utf8(1)->decode_prefix($1);
             last;
         }
     }
profplump commented 10 years ago

Thanks. I was not aware such a call existed. That's a much more future-compatible plan.

firemyst13 commented 10 years ago

What did the signature do?

I've added the two changes here to a local copy of Download.pm and it's now working for me as well.

--- /usr/local/share/perl5/WWW/YouTube/Download.pm  2014-08-26 11:16:36.000000000 -0700
+++ Download.pm 2014-08-26 12:10:06.638000863 -0700
@@ -223,7 +223,7 @@
             croak 'Video not available in your country';
         }
         elsif ($line =~ /^.+ytplayer\.config\s*=\s*({.*})/) {
-            $data = JSON->new->utf8(1)->decode($1);
+            ($data, undef) = JSON->new->utf8(1)->decode_prefix($1);
             last;
         }
     }
@@ -299,11 +299,8 @@
         my $uri = URI->new;
         $uri->query($stuff);
         my $query = +{ $uri->query_form };
-        my $sig = $query->{sig} || _getsig($query->{s});
-        my $url = $query->{url};
-        $fmt_url_map->{$query->{itag}} = $url.'&signature='.$sig;
+        $fmt_url_map->{$query->{itag}} = $query->{url};
     }
-
     return $fmt_url_map;
 }
profplump commented 10 years ago

Signature is one of the parameters YouTube passes around for its own use; it has no specific meaning in this module. It's still present in the requests it's just no longer necessary to parse separately; this module now captures it as part of $query->{url}.

firemyst13 commented 10 years ago

Ah, then I should have removed the sub as well. I'll leave that for the owner.

mnlagrasta commented 9 years ago

I was about to create a pull request in the hopes of getting the fix from @firemyst13 merged in and on to CPAN. However, I found that several people have already done this. @xaicron is this module being maintained at all? Could you please find some time to get at least this basic fix in? It would be greatly appreciated!

oalders commented 9 years ago

@xaicron I'm sure you're quite busy. I'd be happy to merge the fix for this and upload a new release if you want to add me to the repo and give me co-maint: OALDERS.

oalders commented 9 years ago

@tsibley's patch works really well. The quick way to install it is: cpanm git://github.com/tsibley/WWW-YouTube-Download.git@json-and-signature-fixes

profplump commented 9 years ago

FYI: I've moved to this: http://rg3.github.io/youtube-dl/

It's not a handy perl module, but it's easy to use to download videos and/or extract information from YT, and it's being maintained.

oalders commented 9 years ago

This was closed by #31 and a new release is on its way to CPAN.