zserge / jsmn

Jsmn is a world fastest JSON parser/tokenizer. This is the official repo replacing the old one at Bitbucket
MIT License
3.65k stars 778 forks source link

Incremental parsing doesn't work with unquoted stuff when JSMN_STRICT is not defined #179

Open mulle-nat opened 4 years ago

mulle-nat commented 4 years ago

Here is a small program that shows the problem:

//#define JSMN_STRICT
#include "jsmn.h"
#include <stdio.h>
#include <string.h>

#define INCREMENTAL

static char  *json = "[\n"
"   1848,\n"
"   {\n"
"      \"key\": true\n"
"   }\n"
"]";

int   main( void)
{
   jsmn_parser   p;
   jsmntok_t     t[128];
   int           r;
   int           i;
   size_t        len;

   jsmn_init( &p);

   len = strlen( json);

#ifdef INCREMENTAL
   for( i = 1; i <= len; i++)
   {
      r = jsmn_parse(&p, json, i, t, 128);
      if( r < 0)
      {
         if( r == JSMN_ERROR_PART)
            continue;
         fprintf( stderr, "Failed to parse JSON: %d\n", r);
         return 1;
      }
      break;
   }
#else
   r = jsmn_parse( &p, json, len, t, 128);
   if( r < 0)
   {
      fprintf( stderr, "Failed to parse JSON: %d\n", r);
      return 1;
   }
#endif
   for( i = 0; i < r; i++)
      printf( "%.*s\n", t[ i].end - t[ i].start, &json[ t[ i].start]);
   return( 0);
}

This will see unquoted characters be turned into individual tokens:

[
   1848,
   {
      "key": true
   }
]
1
8
4
8
{
      "key": true
   }
key
t
r
u
e

The output for non-incremental or strict is correct:

[
   1848,
   {
      "key": true
   }
]
1848
{
      "key": true
   }
key
true

Obviously it's hard for the parser to detect the expected end for unquoted numbers. The solution could be to append to the previous token if it's a primitive and there has been no intervening space.

pt300 commented 4 years ago

The problem here is mostly the way non-strict mode is. There is no definition of how it works and so it's difficult to say if behaviour is expected or not. Because of that, non-strict mode will be most likely dropped in a future release. In case it's still needed, a proper definition is required. You are welcome to give your input at #159.