Closed kalessin closed 11 years ago
A brief think before answering the observations, and let me know what do you think.
I think the purpose of this method is to produce safe code (that is, something that does not break the layout when rendered), not "correct" html, and we must carefully think how to exactly transform the input, because there are several ways to produce safe code from an unsafe one, and limit this kind of transformations as much as possible. For several reasons:
So we should consider the scope of this method and carefully weight which cases do we really want to fix and how, because the most complex, the less efficient and less maintainable. Consider that in AS we are surely coping with lots of cases of bad html code at output, but this is the first time that we see a problem manifested in the rendering on tables. That is because the browser makes most of the job.
Said that, the first case you show is clearly wrong, but safe. It is the kind of things that the browser can safely render because it adds by itself the lacking tags and the problematic source is enclosed among < table>< /table>. However it is an ugly output and seems to imply a very easy fix of the algorithm, so i anyway think that is worth to correctly handle that case.
In the second case, i even doubt that it is really a wrong html code, so i think that we should not worry about this one while safehtml produces a safe code. And i even think that the current output is a possible correct one. However, most probably output will be different with the fix of the first case. The problem is: should we allow to create the entire table structure for this case? if so, no problem. If not, the code will be a bit more complex (and less efficient). Here appears clearly a compromise between code efficiency loss by doing things that seems not really necessary to do,
About lists, i purposedly avoided to handle this case and other possible cases of nested structures, for the same reasons above and because we are not sure that all nested structures must be handled in the same way. This depends on user side needs. Also, the additional problem with < li> elements is that they can belong to < ul> or to < ol> lists, so, how should really be handled the case when we encounter a sole < li> without its parent? And if one of the < ul> or < ol> appears, it is clear that the algorithm will not be the same as the table one, because extra decisions has to be taken. The table is a more simple case than lists and is treated in a different way.
It doesn't handle nesting of tables correctly and allows many cases where incorrect html is still produced from table tags.
It should either remove the invalid nesting, or insert the missing tags, but never produce incorrect HTML.
Is there any reason why you don't make this more generic and also use it for lists (
ul
,li
), which have similar nesting problems.I have some comments on the source code too, but I think it's important to get the algorithm correct first. One way to help testing would be to generate random HTML, feed it to
safe_html
, and check the output is always valid.