tj / node-querystring

querystring parser for node and the browser - supporting nesting (used by Express, Connect, etc)
MIT License
457 stars 66 forks source link

Numeric array key handling #37

Open benagricola opened 12 years ago

benagricola commented 12 years ago

Noticed an interesting issue with trying to parse a query string from Facebook - when an app is added to a Facebook tab, Facebook redirects the user to a url you give, and appends something like:

?tabs_added[209631315815975]=1&tabs_added[2096313158545932]=0

And... well this simply doesn't parse - the output is logged as { tabs_added: [ ] }

Have looked into why, and of course it's to do with the way integer keys are handled by querystring.

A querystring of ?foo[10]=1&foo[20]=0 will produce { foo: [ , , , , , , , , , , '1', , , , , , , , , , '0' ] }.

Fine - but no use with the numbers returned by Facebook (which I guess are too big to be represented), and if you specify smaller, but still valid integers, you can basically cause node to crash by allocating too much memory, for example: ?foo[429496729]=1 will just cause it to chew up more and more memory until it fatals (on my test machine w/ default max-stack-size, I don't know how much memory that will actually use allocating an array of that length).

If a non-numeric array key is specified alongside the numeric array keys, then an object is used to represent the query string which, while not ideal in every situation, will not cause behaviour that can cause node to crash.

I don't really know what the right option is with regards to fixing this, but needs to be mentioned anyway due to the fact that you can basically pass that query string to any express app and have it either fall over or take ages just to parse the query string from the request.

edit: If the isint regexp is changed to var isint = /^[0-9]{1,5}$/;, then you can essentially represent 99,999 items in an array, but longer keys (which may be liable to cause issues with memory usage / speed) will be treated as string keys and represented as an object. It sucks, but at least it means the query string is parsed without causing node to explode.

tj commented 12 years ago

yeah that's no good, sparse arrays ftw. The support for foo[n] should definitely be capped, I cant imagine any reasonable query using more than 10 (maybe cap around 100 to be safe?), others we could consider string keys

benagricola commented 12 years ago

Another option would be to assume that given ?foo[]=1&foo[]=2&foo[]=10 then an array is used to represent, but if any keys are specified, regardless of type (integer or string) then an object representation will be used instead? (I don't think it makes much sense to have a 200 item array just to represent 1 value from a query string because its' key is 200).

Then again, if you then went on to mix keyed and non-keyed (?foo[]=1&foo[]=2&foo[a]=3), I have no idea how that would even be represented (or if it should even be allowed / valid).

Edit: the second case with keyed and non-keyed might simply be represented as { 0: 1, 1:2, a: 3}, which is better than losing the non-numeric key.

tj commented 12 years ago

that's what we originally had but quite a few libs/devs use numerics expecting them to be arrays

benagricola commented 12 years ago

How about if both modes of operation were possible (and a limit were put in place for array keys), the current one by default, with a switch available which would make everything work in string-key mode unless that specific var had no keys?

I guess it's a pretty app / lib / dev-centric problem in that different approaches work in different situations, so exposing the two ways to do it makes sense when thinking just about node-querystring on its own (its' use in connect / express is another matter since connect.query is implicit in the express application, i.e. how would one go about choosing the query parsing mode?).

tj commented 12 years ago

yeah in express it's implied. hmm.. pretty tough call, if others were not already using it I would say the object representation is fine

benagricola commented 12 years ago

Implemented a fix for this in my fork, obviously doesn't solve the express issue but has an extra benefit of allowing the array-with-keys functionality to be turned completely off if required (by setting the max allowed key to 0). When off it will represent everything except unkeyed arrays as an object.