Closed GoogleCodeExporter closed 9 years ago
I will upstream this to the NekoHTML team.
Thanks,
Arshan
Original comment by arshan.d...@gmail.com
on 16 Jun 2008 at 2:56
Added a unit test for this scenario in r62.
Original comment by diazepam@gmail.com
on 18 Jun 2008 at 8:41
I submitted a patch to NekoHTML that addresses this issue. Even if they don't
accept
the patch (which I'm sure they will), we will ship AntiSamy with a version we
patched and built.
Original comment by arshan.d...@gmail.com
on 11 Jul 2008 at 12:36
I don't think it's reasonable to require users to trust our patched version of
NekoHTML. So, I think we should keep this issue open until NekoHTML has a fix.
Arshan, since I'd like to push for this fix in NekoHTML (maven builds against a
copy
of NekoHTML from the repository, not our local one), feel free to reassign this
to me
to follow up.
Original comment by diazepam@gmail.com
on 23 Dec 2008 at 5:05
here is the link to the bug opened by arshan against neko:
https://sourceforge.net/tracker/index.php?func=detail&aid=1995218&group_id=19512
2&atid=952178
But the dude closed the bug as "can't reproduce", he says that test case works
in
neko. So there might be difference between our use and their test case..
Original comment by fern...@gmail.com
on 23 Dec 2008 at 5:16
also, where is this "patch against neko" that you referred to.. so we can
re-submit
it to neko and get it included or something.
Original comment by fern...@gmail.com
on 23 Dec 2008 at 5:17
To make things crystal clear here - there were two bugs I registered with
NekoHTML.
At the time, the latest release could not be reproduced from an SVN HEAD,
meaning
they had fixed it independently. The other he would not apply my patch for.
Here is
the patch, which adds a simple check to make sure the any attribute encountered
has a
name that doesn't consist of special characters.
After a few conversations with Marc, the lead developer, in which he initially
agreed
with me, he sent me this:
>Hi,
>
>I can't apply this patch even if it fix the problem for you due to the
>reasons explained yesterday: it is correct in your case to ignore
>attributes with illegal names but it is not correct when the document
>handler is interested in such attributes like it is the case for
>instance for HtmlUnit. This is the reason why I think that a new flag is
>needed.
>
>Cheers,
>Marc
He is now interested in making it a "flag", which would be good enough for our
purposes, but is looking for someone to create a full patch, which would require
considerably more digging in the source. I will test NekoHTML 1.9.10 to see if
it
passes the unit tests.
Original comment by arshan.d...@gmail.com
on 23 Dec 2008 at 3:14
These malformed code snippets are not reproducible with any of the recent
versions of
NekoHTML. The latest versions, 1.9.10 and 1.9.11, I can confirm don't have
problem
with this:
<a - href="/">link</a>
<a . href="/">link</a>
However, this snippet:
<SCRIPT =">" SRC=""></SCRIPT>
... still causes an OOM error that must be addressed by NekoHTML. Hopefully
diazepam
can get this fixed!
Original comment by arshan.d...@gmail.com
on 5 Jan 2009 at 6:41
Hrm. Nevermind - my environment setup was still pointing to my patched version
of
Neko - the <a> vectors still fail!
Original comment by arshan.d...@gmail.com
on 5 Jan 2009 at 6:44
do you think those test cases are really blockers? Just knowing and
documenting the
limitations of NekoHTML is good right? Unless you want to properly fork Neko,
and
distribute your own source and build..
just wondering.
Original comment by fern...@gmail.com
on 6 Jan 2009 at 12:51
I agree with fernman, these are not blockers; this is a "known issues". I would
vote for not letting this block
any release.
Until this is fixed in NekoHTML, I think the modified NekoHTML jar should
remain in the distribution with a
release note (or some annotation) indicating why it is patched. I agree that
going forward, distributing
patched dependency jars is not a good solution, but this has been in the
distribution for a while now, and
removing it would create a regression.
Re the general problem of what to do when we have a dependency bug and a patch
which "fixes" (that is, fixes
it as far as we are concerned), I have mixed feelings. If this were a common
problem, and this project were a
bigger, there might be an argument for having some means to build with patches.
Like BSD ports or Gentoo's
Portage, the build system might apply patches to dependencies, with the ability
to switch off patching as
desired. (For all I know Maven can do this already.) However, for a small
project like this, it seems like overkill.
I think the patch should be in the repo since we're distributing it though.
Folks would then have three
options: build from sources and patch for themselves (the paranoid approach),
take the distributed binaries
including the patched NekoHTML, or use standard bits (via maven or otherwise)
and live with the NekoHTML
bug.
My 0.02.
Re-assigning this to me to follow up with NekoHTML dev people.
Original comment by diazepam@gmail.com
on 7 Jan 2009 at 3:43
agree with diazepam
Original comment by fern...@gmail.com
on 7 Jan 2009 at 7:32
Diazepam is going to submit a patch, and if that can happen quickly we'll
release
with it, otherwise we'll do an incremental release in a week or two.
Original comment by arshan.d...@gmail.com
on 8 Jan 2009 at 1:57
I submitted a patch, test case, wrote the mailing list, and waited 2 months - I
will
no longer delay the release for them.
The alternative is to write our own fully featured parser and it doesn't make
sense -
the juice is not worth the squeeze and it realistically cause more problems
than it
will solve.
It is just going to have to be a known bug until they fix it. I would encourage
anyone motivated enough to ping the NekoHTML list and mention that this bug is a
major annoyance!
Original comment by arshan.d...@gmail.com
on 17 Mar 2009 at 2:26
[deleted comment]
I recently encountered a problem while using Neko for parsing the HTML file.
i have a <SCRIPT> tag in the HTML file and have a scenario like this:-
<SCRIPT SRC = "firstJavaScript.js">
<SCRIPT SRC = "secondJavaScript.js"></SCRIPT>
The problem is that, the Neko parser is not able to detect the
secondJavaScript.js src when the first script tag is not closed.
Is there any way which can solve this problem so that the secondJavaScript.js
src can also be detected?
Original comment by tejas.r...@gmail.com
on 2 Aug 2010 at 6:19
Tejas, Arshan, I afraid that it's not pure NECKO bug...
I just can't reproduce it with an older A/S version
http://blog.supremedesign.ru/xss
Things like:
Thing like bold?
bold!normal?
Output: bold? bold!normal
And all scripts removed of course.
i.e. Necko closes open tags automatically.
Original comment by designbi...@gmail.com
on 2 Aug 2010 at 6:44
I'm not sure I follow what's happening.
Tejas: you want AntiSamy to register two events for your input there? I don't
know whether or not Neko is reducing that to one DOM element - it shouldn't,
but maybe it is.
designbistro, you're saying that similar tags are treated as two elements?
It's possible that the behavior of nekohtml changed between your two
environments.
Original comment by arshan.d...@gmail.com
on 2 Aug 2010 at 11:59
Original issue reported on code.google.com by
designbi...@gmail.com
on 4 Jun 2008 at 6:12