1.19x performance improvements.

I've been using Intel's VTune to look at the code and found some really dumb performance improvements that I should have spotted long ago:

The compiler didn't know that the json_parse_state_s* struct that is passed to all methods will not be modified outwith the current call tree and so it was constantly reloading state->offset and state->size members. In the functions where I loop through state->src using the offset and size I cache the offset and size into function variables, which then means they get kept in registers for the entire run of the functions.
Optimized some branches such that our misprediction rate dropped significantly in some of the hot branches.
In the string parsing functions I support both ' and " as string quotes, but I was re-checking which quote to use multiple times which caused branch mispredicts. Instead I store the quote to compare against and just compare against that.
In json_skip_all_skippables instead of checking the flags_bitset value for whether C style comments were supported on every iteration of the loop (the compiler didn't realise flags_bitset wouldn't change!) I check it once and branch into two separate loops (one that does C style comment handling, one without).
Reorder the members of json_parse_state_s to group variables together that are used together such that they appear in the same cacheline.
Change some loops that had switch statements within them such that the default case of the switch was meant to break out of the switch AND the loop, to use a loop-local variable and then check this and break the loop after the switch statement. This helped branch mispredicts and also the layout of branches to be more sane.

sheredom / json.h

1.19x performance improvements. #54