rjatkins / owaspantisamy

Automatically exported from code.google.com/p/owaspantisamy
0 stars 0 forks source link

Malformed HTML cause parsing exception #12

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
Attempt to sanitize tag like:
<a - href="/">link</a>
<a . href="/">link</a>

Closing tags parsed Ok

What is the expected output? What do you see instead?
Expected:
<a href="/">link</a>
<a href="/">link</a>

Exception:
INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.

What version of the product are you using? On what operating system?
v1.1.1, Mac OS

Please provide any additional information below.
This is probably a nekoHTML bug

Original issue reported on code.google.com by designbi...@gmail.com on 4 Jun 2008 at 6:12

GoogleCodeExporter commented 9 years ago
I will upstream this to the NekoHTML team.

Thanks,
Arshan 

Original comment by arshan.d...@gmail.com on 16 Jun 2008 at 2:56

GoogleCodeExporter commented 9 years ago
Added a unit test for this scenario in r62.

Original comment by diazepam@gmail.com on 18 Jun 2008 at 8:41

GoogleCodeExporter commented 9 years ago
I submitted a patch to NekoHTML that addresses this issue. Even if they don't 
accept 
the patch (which I'm sure they will), we will ship AntiSamy with a version we 
patched and built.

Original comment by arshan.d...@gmail.com on 11 Jul 2008 at 12:36

GoogleCodeExporter commented 9 years ago
I don't think it's reasonable to require users to trust our patched version of
NekoHTML. So, I think we should keep this issue open until NekoHTML has a fix.

Arshan, since I'd like to push for this fix in NekoHTML (maven builds against a 
copy
of NekoHTML from the repository, not our local one), feel free to reassign this 
to me
to follow up.

Original comment by diazepam@gmail.com on 23 Dec 2008 at 5:05

GoogleCodeExporter commented 9 years ago
here is the link to the bug opened by arshan against neko:

https://sourceforge.net/tracker/index.php?func=detail&aid=1995218&group_id=19512
2&atid=952178

But the dude closed the bug as "can't reproduce", he says that test case works 
in
neko.  So there might be difference between our use and their test case..

Original comment by fern...@gmail.com on 23 Dec 2008 at 5:16

GoogleCodeExporter commented 9 years ago
also, where is this "patch against neko" that you referred to.. so we can 
re-submit
it to neko and get it included or something.

Original comment by fern...@gmail.com on 23 Dec 2008 at 5:17

GoogleCodeExporter commented 9 years ago
To make things crystal clear here - there were two bugs I registered with 
NekoHTML.
At the time, the latest release could not be reproduced from an SVN HEAD, 
meaning
they had fixed it independently. The other he would not apply my patch for. 
Here is
the patch, which adds a simple check to make sure the any attribute encountered 
has a
name that doesn't consist of special characters.

After a few conversations with Marc, the lead developer, in which he initially 
agreed
with me, he sent me this:

>Hi,
>
>I can't apply this patch even if it fix the problem for you due to the
>reasons explained yesterday: it is correct in your case to ignore
>attributes with illegal names but it is not correct when the document
>handler is interested in such attributes like it is the case for
>instance for HtmlUnit. This is the reason why I think that a new flag is
>needed.
>
>Cheers,
>Marc

He is now interested in making it a "flag", which would be good enough for our
purposes, but is looking for someone to create a full patch, which would require
considerably more digging in the source. I will test NekoHTML 1.9.10 to see if 
it
passes the unit tests.

Original comment by arshan.d...@gmail.com on 23 Dec 2008 at 3:14

GoogleCodeExporter commented 9 years ago
These malformed code snippets are not reproducible with any of the recent 
versions of
NekoHTML. The latest versions, 1.9.10 and 1.9.11, I can confirm don't have 
problem
with this:

<a - href="/">link</a>
<a . href="/">link</a>

However, this snippet:

<SCRIPT =">" SRC=""></SCRIPT>

... still causes an OOM error that must be addressed by NekoHTML. Hopefully 
diazepam
can get this fixed!

Original comment by arshan.d...@gmail.com on 5 Jan 2009 at 6:41

GoogleCodeExporter commented 9 years ago
Hrm. Nevermind - my environment setup was still pointing to my patched version 
of
Neko - the <a> vectors still fail!

Original comment by arshan.d...@gmail.com on 5 Jan 2009 at 6:44

GoogleCodeExporter commented 9 years ago
do you think those test cases are really blockers?  Just knowing and 
documenting the
limitations of NekoHTML is good right?  Unless you want to properly fork Neko, 
and
distribute your own source and build..

just wondering.

Original comment by fern...@gmail.com on 6 Jan 2009 at 12:51

GoogleCodeExporter commented 9 years ago
I agree with fernman, these are not blockers; this is a "known issues". I would 
vote for not letting this block 
any release.

Until this is fixed in NekoHTML, I think the modified NekoHTML jar should 
remain in the distribution with a 
release note (or some annotation) indicating why it is patched. I agree that 
going forward, distributing 
patched dependency jars is not a good solution, but this has been in the 
distribution for a while now, and 
removing it would create a regression.

Re the general problem of what to do when we have a dependency bug and a patch 
which "fixes" (that is, fixes 
it as far as we are concerned), I have mixed feelings. If this were a common 
problem, and this project were a 
bigger, there might be an argument for having some means to build with patches. 
Like BSD ports or Gentoo's 
Portage, the build system might apply patches to dependencies, with the ability 
to switch off patching as 
desired. (For all I know Maven can do this already.) However, for a small 
project like this, it seems like overkill.

I think the patch should be in the repo since we're distributing it though. 
Folks would then have three 
options: build from sources and patch for themselves (the paranoid approach), 
take the distributed binaries 
including the patched NekoHTML, or use standard bits (via maven or otherwise) 
and live with the NekoHTML 
bug.

My 0.02.

Re-assigning this to me to follow up with NekoHTML dev people.

Original comment by diazepam@gmail.com on 7 Jan 2009 at 3:43

GoogleCodeExporter commented 9 years ago
agree with diazepam

Original comment by fern...@gmail.com on 7 Jan 2009 at 7:32

GoogleCodeExporter commented 9 years ago
Diazepam is going to submit a patch, and if that can happen quickly we'll 
release 
with it, otherwise we'll do an incremental release in a week or two.

Original comment by arshan.d...@gmail.com on 8 Jan 2009 at 1:57

GoogleCodeExporter commented 9 years ago
I submitted a patch, test case, wrote the mailing list, and waited 2 months - I 
will
no longer delay the release for them. 

The alternative is to write our own fully featured parser and it doesn't make 
sense -
the juice is not worth the squeeze and it realistically cause more problems 
than it
will solve.

It is just going to have to be a known bug until they fix it. I would encourage
anyone motivated enough to ping the NekoHTML list and mention that this bug is a
major annoyance!

Original comment by arshan.d...@gmail.com on 17 Mar 2009 at 2:26

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I recently encountered a problem while using Neko for parsing the HTML file.
i have a <SCRIPT> tag in the HTML file and have a scenario like this:-

<SCRIPT SRC = "firstJavaScript.js">

<SCRIPT SRC = "secondJavaScript.js"></SCRIPT>

The problem is that, the Neko parser is not able to detect the 
secondJavaScript.js src when the first script tag is not closed.

Is there any way which can solve this problem so that the secondJavaScript.js 
src can also be detected?

Original comment by tejas.r...@gmail.com on 2 Aug 2010 at 6:19

GoogleCodeExporter commented 9 years ago
Tejas, Arshan, I afraid that it's not pure NECKO bug...

I just can't reproduce it with an older A/S version 
http://blog.supremedesign.ru/xss

Things like:

Thing like bold?
bold!normal?

Output: bold? bold!normal

And all scripts removed of course.

i.e. Necko closes open tags automatically.

Original comment by designbi...@gmail.com on 2 Aug 2010 at 6:44

GoogleCodeExporter commented 9 years ago
I'm not sure I follow what's happening. 

Tejas: you want AntiSamy to register two events for your input there? I don't 
know whether or not Neko is reducing that to one DOM element - it shouldn't, 
but maybe it is.

designbistro, you're saying that similar  tags are treated as two elements? 
It's possible that the behavior of nekohtml changed between your two 
environments.

Original comment by arshan.d...@gmail.com on 2 Aug 2010 at 11:59