Open GoogleCodeExporter opened 8 years ago
It appears that HTML parsing exceptions are not being added to the error list
either, for the same reason (no error handler provided). Would you like me to
update this issue, or create a new one?
Original comment by tad...@gmail.com
on 4 Apr 2011 at 4:01
We're not really interested in parsing exceptions - only content that is not
allowed. Unparsed content errors don't really fall into that, although I can
see why you might want them.
People have been using the number of error messages as an indication of whether
the data is safe. This is not what I intended with the API but parser errors
will complicate that (unfortunate) common usage.
Original comment by arshan.d...@gmail.com
on 7 Jun 2011 at 5:25
Arshan,
I can see where you are coming from. I understand that AntiSamy is a scrubber,
and not a validator. And I know that we had been using the number of errors as
an indication whether or not the data is safe (and thus needed the addition of
the parsing exceptions). I know this is wrong, and we are changing.
However, if we take the clean, and some of the input was removed due to parsing
exceptions, I don't see a good way to notify our users that part of their input
was removed. I know that we cannot be the first users to run into this issue,
so any thoughts on the matter would be appreciated.
An example so people will know what I am talking about. A user inputs "<a
href="http://www.google.com>Google</a>", forgetting the end quote on the href
(an honest mistake). The "cleaned" value is an empty string, but no errors are
reported. If this were among a block of HTML, it is possible that the user
would overlook it when comparing the output with their input. We have no way
of indicating to the user that something was removed. Other than just putting
a generic "Please make sure the cleaned output is as expected" type message,
I'm not sure how to make the users aware that they might have lost something.
I've considered validating the HTML prior, but without using the same parser
that AntiSamy uses, the results could be different. Using the same parser that
AntiSamy uses would run into the same issues that we are seeing now with our
custom code (I can go into details for anyone interested).
Anyhow, I welcome any thoughts or ideas on how to handle this without upsetting
our users too much.
Thanks
-Troy
Original comment by tad...@gmail.com
on 15 Jul 2011 at 8:01
We could separate the two; on top of today's error messages (which are related
to security) we could have "formatting" messages, which could be used to help
users tweak their input. If there's a good spec (or better yet, a patch), this
could make into the release after the forthcoming one.
Original comment by arshan.d...@gmail.com
on 15 Sep 2011 at 8:19
Original issue reported on code.google.com by
tad...@gmail.com
on 29 Mar 2011 at 3:59