Byblos stele HTML rewriting

GoogleCodeExporter commented 8 years ago

After the page has been loaded, we give the content script a chance to rewrite 
the HTML. The simplest method might be simply to call it with 
document.documentElement.innerHTML and let it return whatever it wants, which 
could be simply document.write()d by the browser control script or some such.

Original issue reported on code.google.com by classi...@floodgap.com on 12 Oct 2011 at 5:34

GoogleCodeExporter commented 8 years ago

Or, better still, let it DO whatever it wants, which can be nothing (default), 
or something like

<body>
<script type="text/javascript">
parser = new DOMParser();
xmlDoc = parser.parseFromString("<div xmlns = \"http://www.w3.org/1999/xhtml\" 
style=\"background-color:blue;\">blah<span 
style=\"background-color:red;\">blah</span></div>", "text/xml");
document.body.appendChild(xmlDoc.documentElement);
</script>
</body>

or indeed simply document.write() out a new document, or etc.

Original comment by classi...@floodgap.com on 12 Oct 2011 at 5:36

GoogleCodeExporter commented 8 years ago

Move down

Original comment by classi...@floodgap.com on 24 Oct 2011 at 7:30

Added labels: Milestone-Release9.3.2
Removed labels: Milestone-Release9.3.1

GoogleCodeExporter commented 8 years ago

So here's a more fleshed out idea. I might try to silently implement this in 
9.3.0.

- Scripts are to be stored in %appdir%/Fossils/ so they can be referenced as 
res://programdir/Fossils/ or some such. Scripts should be named by domain 
(www.google.com).
- We should implement M430155 (nsITraceableChannel). This will make this a lot 
easier by giving us a hook into the HttpChannel that JavaScript can directly 
access. The patch will mostly apply to us; it's actually not very complex. We 
need to implement the idl also.
- Implement a JavaScript XPCOM singleton module along the lines of

http://www.softwareishard.com/blog/firebug/nsitraceablechannel-intercept-http-tr
affic/

The httpRequestObserver tries to load a JS script by that domain name. If it 
fails, nothing happens and the load continues with no stream observer. This 
result is not cached so that people can add and remove scripts at will.

observe : function(aSubject, aTopic, aData) {  
  if ("http-on-examine-response" == aTopic) {  
    var url;  

    aSubject.QueryInterface(Components.interfaces.nsIHttpChannel);  
    url = aSubject.URI.spec;

Original comment by classi...@floodgap.com on 12 Jan 2012 at 11:53

GoogleCodeExporter commented 8 years ago

function TracingListener() {
    this.originalListener = null;
}

TracingListener.prototype =
{
    onDataAvailable: function(request, context, inputStream, offset, count) {
        this.originalListener.onDataAvailable(request, context, inputStream, offset, count);
    },

    onStartRequest: function(request, context) {
        this.originalListener.onStartRequest(request, context);
    },

    onStopRequest: function(request, context, statusCode) {
        this.originalListener.onStopRequest(request, context, statusCode);
    },

    QueryInterface: function (aIID) {
        if (aIID.equals(Ci.nsIStreamListener) ||
            aIID.equals(Ci.nsISupports)) {
            return this;
        }
        throw Components.results.NS_NOINTERFACE;
    }
}

observe: function(aSubject, aTopic, aData)
{
    if (aTopic == "http-on-examine-response") {
        var newListener = new TracingListener();
        aSubject.QueryInterface(Ci.nsITraceableChannel);
        newListener.originalListener = aSubject.setNewListener(newListener);
    }
}

Original comment by classi...@floodgap.com on 13 Jan 2012 at 2:21

GoogleCodeExporter commented 8 years ago

TracingListener.prototype =
{
    originalListener: null,
    receivedData: null,
    buffer: null,

    onStartRequest: function(request, context) {
        this.receivedData = []; //initialize the array
        this.buffer = []; //initialize the array

    //Pass on the onStartRequest call to the next listener in the chain -- VERY IMPORTANT
    this.originalListener.onStartRequest(request, context);
    },
    onDataAvailable: function(request, context, inputStream, offset, count)
    {
       var binaryInputStream = CCIN("@mozilla.org/binaryinputstream;1",
                                 "nsIBinaryInputStream");
        binaryInputStream.setInputStream(inputStream);

        var storageStream = CCIN("@mozilla.org/storagestream;1",
                                 "nsIStorageStream");
        //8192 is the segment size in bytes, count is the maximum size of the stream in bytes
        storageStream.init(8192, count, null); 

    var binaryOutputStream = CCIN("@mozilla.org/binaryoutputstream;1",
                                 "nsIBinaryOutputStream");
        binaryOutputStream.setOutputStream(storageStream.getOutputStream(0));

        // Copy received data as they come.
        var data = binaryInputStream.readBytes(count);

        this.receivedData.push(data);

        binaryOutputStream.writeBytes(data, count);
        this.buffer.push({request:request, context:context, inputStream:storageStream.newInputStream(0), offset:offset, count:count});
    },

onStopRequest: function(request, context, statusCode)
    {
    try
    {
                //QueryInterface into HttpChannel to access originalURI and requestMethod properties
        request.QueryInterface(Ci.nsIHttpChannel);

                //this is specific to the PirateQuesting Add-on, but is left here as an example of how to modify behaviour based on the requested URL
        if (request.originalURI
                    && piratequesting.baseURL == request.originalURI.prePath
                    && request.originalURI.path.indexOf("/index.php?ajax=") == 0)
        {

            var data = null;
            if (request.requestMethod.toLowerCase() == "post")
            {
                var postText = this.readPostTextFromRequest(request, context);
                if (postText)
                    data = ((String)(postText)).parseQuery();

            }

                        //Combine the response into a single string
            var responseSource = this.receivedData.join('');

            //fix leading spaces bug
            //(FM occasionally adds spaces to the beginning of their ajax responses...
                        //which breaks the XML)
            responseSource = responseSource.replace(/^\s+(\S[\s\S]+)/, "$1");

                        //gets the date from the response headers on the request.
                        //For PirateQuesting this was preferred over the date on the user's machine
            var date = Date.parse(request.getResponseHeader("Date"));

                        //Again a PQ specific function call, but left as an example.
                        //This just passes a string URL, the text of the response,
                        //the date, and the data in the POST request (if applicable)
            piratequesting.ProcessRawResponse(request.originalURI.spec,
                                               responseSource,
                                               date,
                                               data);
        }

        //Now that we:re done with the data for our part, we can pass it on.
        var buffItem;
        for (var i=0, len=this.buffer.length;i<len;i++) {
                       buffItem = this.buffer[i];
                this.originalListener.onDataAvailable(buffItem.request,
                                          buffItem.context,
                                          buffItem.inputStream,
                                          buffItem.offset,
                                          buffItem.count);
        }

    }
    catch (e)
    {
        //standard function to dump a formatted version of the error to console
        dumpError(e);
    }
    //Pass it on down the chain
    this.originalListener.onStopRequest(request,
                                         context,
                                         statusCode);
    },

Original comment by classi...@floodgap.com on 13 Jan 2012 at 3:37

GoogleCodeExporter commented 8 years ago

What we should do is break this apart into 4K chunks upon blasting back to the 
channel and call onDataAvailable on all of those.

This code could easily go in the browser.js.

The fossil should return an object with two slots. wantURL : function(x) is 
called with the URI.spec and should return true if the fossil wants to do 
something with this URL (even if that something is to redirect to another 
object in a future version). If false, nothing happens. parseHTML : function(x) 
is called with the body in onStopRequest() and should return another two slot 
object, one "response" (only supported reply right now is "ok") and "body" (the 
new HTML).

We already have all the other IDLs that this needs in xpcom/io.

Original comment by classi...@floodgap.com on 13 Jan 2012 at 4:04

GoogleCodeExporter commented 8 years ago

Load the script synchronously this way from a res:// URL:

function getContents(aURL){
  var ioService=Components.classes["@mozilla.org/network/io-service;1"]
    .getService(Components.interfaces.nsIIOService);
  var scriptableStream=Components
    .classes["@mozilla.org/scriptableinputstream;1"]
    .getService(Components.interfaces.nsIScriptableInputStream);

  var channel=ioService.newChannel(aURL,null,null);
  var input=channel.open();
  scriptableStream.init(input);
  var str=scriptableStream.read(input.available());
  scriptableStream.close();
  input.close();
  return str;
}

try{
  alert(getContents("chrome://browser/content/browser.css"));
}catch(e){alert(e)}

Original comment by classi...@floodgap.com on 13 Jan 2012 at 4:12

GoogleCodeExporter commented 8 years ago

So far today:

- Landed nsITraceableChannel. Had to modify it for nsCOMPtr stuff that isn't in 
1.3.1.
- Added plumbing to HTTP protocol handler to fire events on 
http-on-examine-response.
- Added basic observer to browser.js. This much works and gets a channel object 
with URI and MIME type, but the channel object doesn't like being QI'ed to 
nsITraceableChannel, so I haven't gotten the tracer working yet. I wonder if 
we're getting an nsIChannel instead of an nsIHttpChannel. We should try QI'ing 
it to nsIHttpChannel and see if that works. If it doesn't, we're passing the 
wrong object in the protocol handler.

Original comment by classi...@floodgap.com on 15 Jan 2012 at 4:44

GoogleCodeExporter commented 8 years ago

We could also try to stringify the object to see if it is [object 
nsIHttpChannel] or [object nsIChannel].

Original comment by classi...@floodgap.com on 15 Jan 2012 at 4:44

GoogleCodeExporter commented 8 years ago

Did some dirty hacks and we are now tracing the channel.

- nsHttpHandler tries to QI the nsIHttpChannel object to, in fact, 
nsIHttpChannel. For some reason this comes over XPConnect as an nsISupports, 
but we can then immediately QI that to nsIHttpChannel on the other end, and 
from there to nsITraceableChannel.
- nsHttpChannel needed some hacking to properly return the mListener value.

This works to trace requests. Next is to completely replace them.

Original comment by classi...@floodgap.com on 16 Jan 2012 at 6:34

GoogleCodeExporter commented 8 years ago

Today's work:

- nsIStorageStream was for some inexplicable reason not scriptable, so added to 
the XPCOM XPT (M235744).
- On XHR requests, QIing the Byblos tracer (our new name) fails in an 
unrecoverable way, so modified nsITraceableChannel to give us the old listener. 
If this fails, we just abort and the request goes through.
- Successfully blasted 4K chunks of google -> g00G13 using a dummy script.
- Our URIs will be based off resource://programdir/Byblos/

This has an excellent chance of success, so we will officially put it in 9.3.0.

Original comment by classi...@floodgap.com on 17 Jan 2012 at 1:22

Changed title: Byblos stele HTML rewriting
Added labels: Milestone-Release9.3.0
Removed labels: Milestone-Release9.3.2

GoogleCodeExporter commented 8 years ago

We're in BUSINESS!

Sample stele runs 100% as www.google.com.js:

/* This is a demonstration of how to use Byblos to translate HTML. This silly 
example
   turns every occurrence of google -- even in URLs! -- to g00G13. */

function() {
    return {
        wantURI : function(uri) {
            // This is a JavaScript uri object. We can get the ASCII version
            // with uri.asciiSpec. But for this demo, we accept all URLs.
            return true;
        },
        parseHTML : function(text) {
            var munge = text.replace(/google/ig, "g00G13");
            return {
                // In future versions you can reply with other strings, but right
                // now it just recognizes ok and != ok.
                response : "ok",
                body     : munge
            };
        }
    };
};

Original comment by classi...@floodgap.com on 17 Jan 2012 at 2:20

GoogleCodeExporter commented 8 years ago

We'll ship a working sample for developer.mozilla.org. I might work on a couple 
others.

CSS rewriting would be a logical next step, but we'll start with this. It would 
be in the %20CSS folder of Byblos (we can get that with a resource:// URL).

Original comment by classi...@floodgap.com on 17 Jan 2012 at 3:05

Changed state: Started

GoogleCodeExporter commented 8 years ago

Wrote one for yfrog on the 1400 last night. I'll throw that in too.

Docs up under the wiki.

Original comment by classi...@floodgap.com on 19 Jan 2012 at 10:44

GoogleCodeExporter commented 8 years ago

Original comment by classi...@floodgap.com on 20 Jan 2012 at 10:05

Changed state: Verified

luangruo / classilla

Byblos stele HTML rewriting #170