webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.42k stars 216 forks source link

FLV-Scrubber #169

Open despens opened 8 years ago

despens commented 8 years ago

Among the many Flash-based video players that exist, a current project introduced me to FLV Scrubber, a CC licensed player which is not supported anymore by its author.

Currently, pywb is not rewriting the HTML this player requires.

The "variables" for the player can be handed to Flash movie in two ways (in both cases, DOM excerpts are rewritten by pywb already):

1. URL Param Style

<object width="720" height="480">
        <param name="movie" value="/images/FLVScrubber.swf?file=http://mw.smokinggun.com/shots/98/assets/montage_low.flv&amp;bufferTime=3&amp;startAt=0&amp;autoStart">
        <param name="allowScriptAccess" value="sameDomain">
        <param name="allowFullScreen" value="false">
        <param name="bgcolor" value="000000">
        <embed src="/man-with-a-movie-camera-the-global-remake/20160217142125oe_/http://dziga.perrybard.net/images/FLVScrubber.swf?file=http://mw.smokinggun.com/shots/98/assets/montage_low.flv&amp;bufferTime=3&amp;startAt=0&amp;autoStart" type="application/x-shockwave-flash" allowscriptaccess="sameDomain" allowfullscreen="false" bgcolor="#000000" width="720" height="480">
        </object>

Here the FLV video file the player should show is listed as part of the URL of the flash player, in the "movie" <param> for the <object> and the "source" attribute of <embed>.

2. Separated URL

<object id="movie" width="380" height="285" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://localhost:8080/man-with-a-movie-camera-the-global-remake/20160217142125oe_/http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0">
          <param name="movie" value="/images/FLVScrubber.swf">
          <param name="bgcolor" value="#000000">
          <param name="allowScriptAccess" value="sameDomain">
          <param name="allowFullScreen" value="true">
          <param name="flashVars" value="file=http://mw.smokinggun.com/full.flv&amp;bufferTime=3&amp;startAt=10000">

          <embed src="/man-with-a-movie-camera-the-global-remake/20160217142125oe_/http://dziga.perrybard.net/images/FLVScrubber.swf" bgcolor="#000000" width="380" height="285" name="movie" allowscriptaccess="sameDomain" allowfullscreen="true" flashvars="file=http://mw.smokinggun.com/full.flv&amp;bufferTime=3&amp;startAt=10000" type="application/x-shockwave-flash" pluginspage="http://www.adobe.com/go/getflashplayer"></object>

Here, the parameters which are handed over to the Flash video player are separated as both a "flashvars" <param> tag and a "flashvars" attribute of the <embed> tag.

Client-Side rewrite

By injecting this script into the collection's banner, I was able to rewrite the HTML so the videos play in Chrome and Firefox (with Flash plugin):

document.addEventListener('readystatechange', function(event) {

    if (document.readyState !== 'interactive') return true; // only run once

    // replace the offending absolute URL in any string
    var replaceURL = function(str) {
        return str.replace('http://mw.smokinggun.com/', wbinfo.prefix + wbinfo.request_ts + '/mw.smokinggun.com/');
    }

    // find all <object> nodes.
    // The typical use of <object> is to embed plugins into a web page.
    // As far as I know, only legacy versions of Internet Explorer uses this to
    // embed the ActiveX variation of the Flash plugin.
    // Both Firefox and Chrome use the <embed> nested within <object> apparently. 
    var objectNodes = document.querySelectorAll('object');

    // Since the Flash plugin seems to be getting information from the DOM
    // before the readystatechange triggers, the existing <object> nodes and 
    // their children <embed> nodes are cloned. the clones' attributes are rewritten,
    // then the original nodes are replaced by the rewritten clones.
    // This forces the Flash plugin to re-read the information.
    //   Because only legacy versions of Internet care about the <object> tag
    // in this case, I did not manipulate the information in the nested <param>
    // tags.
    var objectNodesCloned = [];
    for (var i=0; i<objectNodes.length; i++) {
        objectNodesCloned.push( {source: objectNodes[i], clone: objectNodes[i].cloneNode(true) } );
    }

    objectNodesCloned.forEach(function(obj, index){

        var embedNodes = obj.clone.querySelectorAll('embed');
        for (var i=0; i<embedNodes.length; i++) {
            ['flashvars', 'src'].forEach(function(attrName){
                if(embedNodes[i].hasAttribute(attrName)) {
                    str = replaceURL( embedNodes[i].getAttribute(attrName) );
                    embedNodes[i].setAttribute(attrName, str);

                    // the video URL might be encoded inside the source string as well.
                    // In this case, separate the videos URL into "flashvars" and
                    // keep the original "player movie" at the same URL.
                    //   This is mostly due to how the FLV videos were archived,
                    // that is, by scraping them from their server. So their URLs need
                    // to be noted independently from the URL of the movie player.
                    if (attrName == 'src') {
                        if (str.indexOf('?') > -1) {
                            var URLparts = str.split('?');
                            embedNodes[i].setAttribute('src', URLparts[0]);
                            embedNodes[i].setAttribute('flashvars', URLparts[1]);
                        }
                    }
                }
            }); 
        }

        obj.source.parentNode.replaceChild(obj.clone, obj.source);
    });

});
ikreymer commented 6 years ago

@despens is this still relevant, or obsolete now, with most video being HTML5 based?