Open gbishop opened 14 years ago
Another possibility occurs to me. We could combine 1 and 3 using the schema to tell how fields should be escaped. Each field would get escaped unless it had format html in the schema. Even then it might be filtered against a white-list of allowable markup.
Duncan says "what about arithmetic expressions?"
I read through that XSS page Pete suggested: http://code.google.com/p/doctype/wiki/ArticleIntroductionToXSS
That got me thinking about our vulnerability. Seems to me that the db is the source of most danger for us and I've got a suggestion. It isn't perfect by any means but I think it reduces the attack surface by a lot.
The basic idea is to make torongo do most of the work. We could do that a few ways. This is all 1/2 baked at best.
Possibilities I can think of include:
1 and 2 have the advantage of being automatic except for the case where your data happens to look like HTML or have certain characters. I think those cases are fairly rare. So there is still work for the developer but it isn't work you have to do everywhere; it is isolated to a few fields where you really needed anything and thus you're thinking of the danger. If the server simply replaces the characters with the appropriate entities, stuff that only needs to "look" the same will be fine. I think that is most cases. The protection is continuous and pervasive.
1 and 2 require some encoding for HTML data. For example in BigWords game instructions and examples have format: HTML. I did it cause, at the time, I thought that was cool giving them a rich-text editor. They would have been just as happy with simple markup, maybe happier. I already have to bash the data coming out of the dojo Editor control because if you paste into it from MSWord the content is HUGE and BROKEN. Also, I bet there is some attack where script tags are inserted into the editor. If HTML was not allowed by the DB (cases 1&2) I could just encode that stuff before putting it in the db and decode it on the way out. In that decoding I'd be thinking about the dangers. Maybe the answer is to simply remove the format HTML feature from DojoFormGenerator.
3 has the advantage of fitting right in to the current system. You immediately know you've broken something. But now the programmer has to handle these error cases on user input. If the user is a real bad guy, we might not care. But if the user just accidentally wrote something that triggered the failure, it seems unlikely that most of our programmers (including me) would handle that case well. And the user is unlikely to know what went wrong. The report we'd get would be "It won't save." and we wouldn't be able to tell why.
Choice 1 seems more efficient than choice 2 cause the work is only done on store instead of on every access. 1 allows some back door attack but if someone can open our DB directly we've got much worse problems.
I'm just beginning to think about this. I'd appreciate input from anyone reading this.