tarantool / nginx_upstream_module

Tarantool NginX upstream module (REST, JSON API, websockets, load balancing)
Other
174 stars 18 forks source link

Add skipping method for garbage fields when parsing json #138

Closed romanhabibov closed 3 years ago

romanhabibov commented 3 years ago

@ChangeLog

romanhabibov commented 3 years ago

Hi! Thank you for the patch! According to https://github.com/tarantool/tarantool/wiki/Code-review-procedure#commit-message the commit title might look like: "parsing:..." Add a commit message describing how it reproduced (curl ... ) and what did you do to fix the bug.

See reproducer in the new test file.

In addition, I could not run the tests, can you describe how to run them.

See my new pull request. Start NginX and Tarantool instances with https://github.com/tarantool/nginx_upstream_module/tree/test#test-run-and-notes-for-contributors, and then you can use this pattern for testing: curl -s -d '{"id":0,"method":"skipping_test","params":[]}' http://127.0.0.1:8081/tnt | jq.

romanhabibov commented 3 years ago

Hi! Thank you for the patch. Add @ChangeLog.

Done.

AFAIU "skipping method" don't exist before the patch. So in context "Add better skipping method for garbage fields when parsing json." better than nothing? Add more detailed description with explanation what is not "garbage".

Done.

romanhabibov commented 3 years ago

LGTM except #138 (comment) .

Done.

Totktonada commented 3 years ago

We don't need to introduce the second primitive for tracking the nesting level: the code already have so called stack. What is incorrect now: it tracks only those arrays and maps that are inside "params" value. The code also have the implementation of skipping unknown fields: everything that is not inside "params" (+few other cases) is skipped. The only thing we should fix is to move nesting tracking outside stage == PARAMS checks.

Please, consider the patch below. It seems, it passes all tests, however I'm not sure it is quite accurate.

Patch for reading ```diff diff --git a/src/tp_transcode.c b/src/tp_transcode.c index b945620..a8de43a 100644 --- a/src/tp_transcode.c +++ b/src/tp_transcode.c @@ -467,23 +467,17 @@ yajl_start_map(void *ctx) { yajl_ctx_t *s_ctx = (yajl_ctx_t *)ctx; - if (unlikely(s_ctx->stage != PARAMS)) - return 1; - stack_grow_array(s_ctx); - bool r = false; - if (unlikely(s_ctx->size == 0)) - /**/ - r = stack_push(s_ctx, s_ctx->tp.p, TYPE_MAP | PARAMS); - else - r = stack_push(s_ctx, s_ctx->tp.p, TYPE_MAP); - + bool r = stack_push(s_ctx, s_ctx->tp.p, TYPE_MAP); if (unlikely(!r)) { say_error(s_ctx, -32603, "[BUG?] 'stack' overflow"); return 0; } + if (unlikely(s_ctx->stage != PARAMS)) + return 1; + if (unlikely(s_ctx->tp.e < s_ctx->tp.p + 1 + sizeof(uint32_t))) say_overflow_r_2(s_ctx); @@ -498,8 +492,8 @@ yajl_end_map(void *ctx) { yajl_ctx_t *s_ctx = (yajl_ctx_t *)ctx; - if (s_ctx->size == 0) { - + stack_item_t *item = stack_pop(s_ctx); + if (item != NULL && s_ctx->size == 0) { if (!(s_ctx->been_stages & PARAMS)) { dd("ADDING EMPTY PARAMS\n"); @@ -523,19 +517,21 @@ yajl_end_map(void *ctx) if (unlikely(s_ctx->stage != PARAMS)) return 1; - stack_item_t *item = stack_pop(s_ctx); if (likely(item != NULL)) { - dd("map close, count %d '}'\n", (int)item->count); - if (unlikely(item->type & PARAMS)) { - say_wrong_params(s_ctx); - return 0; - } + /* + * A map instead of an array as "params" value. + * + * {<...>, "params": {<...>}} + */ + if (unlikely(s_ctx->size == 1)) { + say_wrong_params(s_ctx); + return 0; + } *(item->ptr++) = 0xdf; *(uint32_t *) item->ptr = mp_bswap_u32(item->count); - } else { say_wrong_params(s_ctx); return 0; @@ -550,30 +546,36 @@ yajl_start_array(void *ctx) { yajl_ctx_t *s_ctx = (yajl_ctx_t *)ctx; + /* + * Don't store a stack item for 'batching' array. This way we + * unify processing of both cases: when this array is present + * and when it does not. + * + * 1. {"method": <...>, "params": <...>, "id": <...>, <...>} + * 2. [ + * {"method": <...>, "params": <...>, "id": <...>, <...>}, + * {<...>} + * ] + * + * All other arrays and maps are tracked in the stack. + */ if (s_ctx->stage == INIT) { s_ctx->batch_mode_on = true; return 1; } - if (unlikely(s_ctx->stage != PARAMS)) - return 1; - dd("array open '['\n"); - stack_grow_array(s_ctx); - bool push_ok = false, bind_first_argument = false; - if (unlikely(s_ctx->size == 0)) { - push_ok = stack_push(s_ctx, s_ctx->tp.p, TYPE_ARRAY | PARAMS); - bind_first_argument = true; - } else - push_ok = stack_push(s_ctx, s_ctx->tp.p, TYPE_ARRAY); - + bool push_ok = stack_push(s_ctx, s_ctx->tp.p, TYPE_ARRAY); if (unlikely(!push_ok)) { say_error(s_ctx, -32603, "[BUG?] 'stack' overflow"); return 0; } + if (unlikely(s_ctx->stage != PARAMS)) + return 1; + if (unlikely(s_ctx->tp.e < (s_ctx->tp.p + 1 + sizeof(uint32_t)))) say_overflow_r_2(s_ctx); @@ -582,7 +584,7 @@ yajl_start_array(void *ctx) // Here is bind data // e.g. http request // [ - if (bind_first_argument) { + if (s_ctx->size == 2) { if (unlikely(!bind_data(s_ctx))) say_overflow_r_2(s_ctx); } @@ -596,16 +598,22 @@ yajl_end_array(void *ctx) { yajl_ctx_t *s_ctx = (yajl_ctx_t *)ctx; + stack_item_t *item = stack_pop(s_ctx); + if (unlikely(s_ctx->stage != PARAMS)) return 1; - stack_item_t *item = stack_pop(s_ctx); if (likely(item != NULL)) { - dd("array close, count %d ']'\n", item->count); size_t item_count = item->count; - if (unlikely(item->type & PARAMS)) { + + /* + * An end of "params" array. + * + * {<...>, "params": [<...>]}. + */ + if (unlikely(s_ctx->size == 1)) { dd("PARAMS END\n"); s_ctx->stage = WAIT_NEXT; s_ctx->been_stages |= PARAMS; ```

The same patch as a file (for applying convenience): gh-132-skip-unknown-fields.patch.zip

Aside of stack calculations fix I simplified the logic around TYPE_MAP | PARAMS and TYPE_ARRAY | PARAMS. We don't need to store the PARAMS flag when an array or a map is started. This flag depends on the nesting level and whether we're parsing "params" value. This information is available when we meet the array / map end, so we can use it directly.

romanhabibov commented 3 years ago

We don't need to introduce the second primitive for tracking the nesting level: the code already have so called stack. What is incorrect now: it tracks only those arrays and maps that are inside "params" value. The code also have the implementation of skipping unknown fields: everything that is not inside "params" (+few other cases) is skipped. The only thing we should fix is to move nesting tracking outside stage == PARAMS checks.

Please, consider the patch below. It seems, it passes all tests, however I'm not sure it is quite accurate.

Patch for reading The same patch as a file (for applying convenience): gh-132-skip-unknown-fields.patch.zip

Aside of stack calculations fix I simplified the logic around TYPE_MAP | PARAMS and TYPE_ARRAY | PARAMS. We don't need to store the PARAMS flag when an array or a map is started. This flag depends on the nesting level and whether we're parsing "params" value. This information is available when we meet the array / map end, so we can use it directly.

I checked your patch and found no problems. I'm OK.

Totktonada commented 3 years ago

I don't see downsides in the code. You too. So, it looks as the way to go.

Please, fix remaining questions regarding the test.

Then we'll able to push this PR (but after #141).