wolverine2k / crunchy

Automatically exported from code.google.com/p/crunchy
0 stars 0 forks source link

Re-implement vlam.py in javascript #126

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I know this sounds ridiculous, but bear with me...

Currently, code in vlam.py takes an HTML DOM and plays around with it: it
converts things based on the stuff in security.py and then it looks for
VLAM and soes the appropriate thing.

It is this second pass, processing VLAM, that I think could be implemented
in javascript. Using jQuery (which really does take the pain out of
javascript development) the following code would apply a conversion to all
interpreter elements:

$(pre[title^=interpreter]).each(function(i){
      // this code will be called for each interpreter element
      // get a JQuery for the current elem:
    var elem = $(this);
      // extract the code, this automatically gets rid of any extra markup
that might interfere
    var code = elem.text();
      // generate a uid for the element
    var uid = uidgen();
      // turn it into a div, with the given uid
    elem.replaceWith('<div id="'+uid+'" class="crunchy"></div>');
      // get the new elem to work with
    elem = $("#"+uid);
      // insert the code again, inside a pre element
    elem.html("<pre>" + code + "</pre>");
      // call a function to append an output widget
    append_output_widget(elem);
      // and finally fire off execution
    initialise_interpreter(uid);
});

I think this has a good chance of enhancing performance significantly: even
though javascript is considerably slower than Python, it is hugely
optimised for playing with the DOM. I realise that the above code is hugely
simplified, but the principle should be clear.

Original issue reported on code.google.com by johannes...@gmail.com on 3 Jun 2008 at 9:05

GoogleCodeExporter commented 8 years ago
Sorry, I forgot to mention the main advantage - ths way the browser has t deal 
with
all the buggy HTML - all we have to do is scan for security problems, for which 
we
really don't need a proper DOM tree. BeautifulSoup is very good at what it 
does, but
modern browsers are considerably better.

Original comment by johannes...@gmail.com on 3 Jun 2008 at 9:12

GoogleCodeExporter commented 8 years ago
It might be worth exploring ... but I can see some potential significant 
problems. 
Furthermore, the way I read the original entry and the first comment, two 
different
approaches are suggested.

1. As per comment 1: "All we have to do is scan for security problems..."  This 
is
extremely tricky.  Many security holes are caused by mal-formed javascript/html 
code.
See http://ha.ckers.org/xss.html for some samples.  The beauty of the approach 
we use
is that BeautifulSoup only produces valid html code even when fed "garbage"; we 
can
then use a white list with ElementTree to remove unwanted stuff.

2. After the security process, we have a well-formed tree.  If I understand your
"second pass" comment (in the original entry for the issue), we would need to 
write
it out as a string and use jquery to process it in the browser.  No matter how 
highly
optimized jquery is for playing with the DOM, I would put my money on effbot's
elementtree to process a tree faster.  

So, which way is it:  bypass ElementSoup and scan for security problems using
javascript (good luck!), or use ElementSoup, scan for security problems using a 
white
list (and an ElementTree object), output a string ... and do the vlam processing
using jquery?...

Additional comments:
3. I plan to have unit tests for all the functions/methods in vlam.py  (and 
refactor
it a bit).  I hate the thought of writing unit tests for javascript...
4. I want to see as little javascript as possible. ;-)

Original comment by andre.ro...@gmail.com on 3 Jun 2008 at 11:51

GoogleCodeExporter commented 8 years ago

Original comment by andre.ro...@gmail.com on 19 Aug 2009 at 11:31