Open GoogleCodeExporter opened 8 years ago
Sorry, I forgot to mention the main advantage - ths way the browser has t deal
with
all the buggy HTML - all we have to do is scan for security problems, for which
we
really don't need a proper DOM tree. BeautifulSoup is very good at what it
does, but
modern browsers are considerably better.
Original comment by johannes...@gmail.com
on 3 Jun 2008 at 9:12
It might be worth exploring ... but I can see some potential significant
problems.
Furthermore, the way I read the original entry and the first comment, two
different
approaches are suggested.
1. As per comment 1: "All we have to do is scan for security problems..." This
is
extremely tricky. Many security holes are caused by mal-formed javascript/html
code.
See http://ha.ckers.org/xss.html for some samples. The beauty of the approach
we use
is that BeautifulSoup only produces valid html code even when fed "garbage"; we
can
then use a white list with ElementTree to remove unwanted stuff.
2. After the security process, we have a well-formed tree. If I understand your
"second pass" comment (in the original entry for the issue), we would need to
write
it out as a string and use jquery to process it in the browser. No matter how
highly
optimized jquery is for playing with the DOM, I would put my money on effbot's
elementtree to process a tree faster.
So, which way is it: bypass ElementSoup and scan for security problems using
javascript (good luck!), or use ElementSoup, scan for security problems using a
white
list (and an ElementTree object), output a string ... and do the vlam processing
using jquery?...
Additional comments:
3. I plan to have unit tests for all the functions/methods in vlam.py (and
refactor
it a bit). I hate the thought of writing unit tests for javascript...
4. I want to see as little javascript as possible. ;-)
Original comment by andre.ro...@gmail.com
on 3 Jun 2008 at 11:51
Original comment by andre.ro...@gmail.com
on 19 Aug 2009 at 11:31
Original issue reported on code.google.com by
johannes...@gmail.com
on 3 Jun 2008 at 9:05