uncopenweb / torongo

Utilities for using Mongo with Tornado.
6 stars 1 forks source link

XSS and markup in the db #10

Open gbishop opened 14 years ago

gbishop commented 14 years ago

I read through that XSS page Pete suggested: http://code.google.com/p/doctype/wiki/ArticleIntroductionToXSS

That got me thinking about our vulnerability. Seems to me that the db is the source of most danger for us and I've got a suggestion. It isn't perfect by any means but I think it reduces the attack surface by a lot.

The basic idea is to make torongo do most of the work. We could do that a few ways. This is all 1/2 baked at best.

Possibilities I can think of include:

  1. The server escapes strings before they are stored. If your non-HTML data happens to have HTML looking stuff in it, you're required to protect them somehow cause the DB is going to escape it.
  2. The server escapes strings as they are served. Again, as above, if you want HTML or certain characters literally in your data, you have to encode them.
  3. Use the schema. We already have the "format" schema field. We use it in DojoFormGenerator to determine the type of control you'll get. So, if the format of a string field is HTML, then markup is allowed (maybe only a subset is allowed). If the format is text or not defined, then no HTML is allowed. Perhaps we add a "raw" format that allows anything. So your data would get rejected at the schema check level if values contain invalid markup. Your POST or PUT would fail if you tried to insert a string with potentially dangerous content.

1 and 2 have the advantage of being automatic except for the case where your data happens to look like HTML or have certain characters. I think those cases are fairly rare. So there is still work for the developer but it isn't work you have to do everywhere; it is isolated to a few fields where you really needed anything and thus you're thinking of the danger. If the server simply replaces the characters with the appropriate entities, stuff that only needs to "look" the same will be fine. I think that is most cases. The protection is continuous and pervasive.

1 and 2 require some encoding for HTML data. For example in BigWords game instructions and examples have format: HTML. I did it cause, at the time, I thought that was cool giving them a rich-text editor. They would have been just as happy with simple markup, maybe happier. I already have to bash the data coming out of the dojo Editor control because if you paste into it from MSWord the content is HUGE and BROKEN. Also, I bet there is some attack where script tags are inserted into the editor. If HTML was not allowed by the DB (cases 1&2) I could just encode that stuff before putting it in the db and decode it on the way out. In that decoding I'd be thinking about the dangers. Maybe the answer is to simply remove the format HTML feature from DojoFormGenerator.

3 has the advantage of fitting right in to the current system. You immediately know you've broken something. But now the programmer has to handle these error cases on user input. If the user is a real bad guy, we might not care. But if the user just accidentally wrote something that triggered the failure, it seems unlikely that most of our programmers (including me) would handle that case well. And the user is unlikely to know what went wrong. The report we'd get would be "It won't save." and we wouldn't be able to tell why.

Choice 1 seems more efficient than choice 2 cause the work is only done on store instead of on every access. 1 allows some back door attack but if someone can open our DB directly we've got much worse problems.

I'm just beginning to think about this. I'd appreciate input from anyone reading this.

gbishop commented 14 years ago

Another possibility occurs to me. We could combine 1 and 3 using the schema to tell how fields should be escaped. Each field would get escaped unless it had format html in the schema. Even then it might be filtered against a white-list of allowable markup.

gbishop commented 14 years ago

Duncan says "what about arithmetic expressions?"