Potential security and performance problems in DecodeJSONString

wmjordan commented 6 years ago

1) Security and performance problems

There're some problems in the source code of the DecodeJSONString method, in the Unicode decoding section for \u????.

const int offset = 0x10000;

// problem 1: (security) there may not be 4 more characters after the \u sequence
// problem 2: (performance) if possible, read the encoded character sequence one by one
//          and calculate the int number for "uu", so the temporary string "str" can be avoided
var str = new string(ptr, index + 1, 4);
var uu = Int32.Parse(str, NumberStyles.HexNumber);

// problem 3: (performance) append the (char)uu directly to sb, don't create the string
var u = uu < offset ? new string((char)uu, 1) :

// problem 4: (performance) append the two char instances one by one to sb,
//                 it is quite OK to do so, don't create the char array or the string
    new string(
        new char[]{
            (char)(((uu - offset) >> 10) + 0xD800),
            (char)((uu - offset) % 0x0400 + 0xDC00)
        }
    );
sb.Append(u);
index += 4;
break;

2) Security problem about single quoted string:

NetJSON.NetJSON.Deserialize<string>("'\u56f3\u9762'")

The above statement will produce unexpected result like the following (executed in the C# Intermediate window of VS):

"\0ʳ\0ā\0\0\0\u0002\0\u0004\0\b\0\n\0離߮\u1afe\0\n\0\u0001\0\0\0\u0004\0\u0006\0뇠߮<\0\0\0\0\0\u0004\0\b\0눜߮\0\0\0\0\0\0\u0004\0눜߮\0\0\u0001\0Ā\0\0\0\u0004\0\u0006\0\b\0\n\0\f\0\u0010\0\u0012\0\u0016\0\u001a\0눜߮\u001a\0\0\0\0\0\u0004\0눶߮\0\0\0\0\0\0\u0004\0\b\0\f\0눶߮\0\0\u0003\0\0\0Ā\0\0\0\u0002\0\u0004\0\u0006\0\b\0\f\0\u000e\0\u0012\0\u0016\0\u0018\0눶߮H\0\0\0\u0001\0\0\0\u0004\0\u0006\0뉾߮\0\0\0\0\u0001\0\0\0\u0004\0\b\0\f\0\u000e\0뉾߮\0\0\0\0Ā\0\0\0\u0004\0\b\0\n\0뉾߮\0\0\0\0\u0001\0\0\0\u0004\0\b\0\f\0\u0010\0\u0012\0뉾߮\0\0\u0004\0\u0001\0\0\0\u0004\0\b\0\f\0\u000e\0뉾߮8\0G\0\u0001\0\0\0\u0002\0\u0004\0늶߮Ĝ\0\0\0\u0001\0\0\0\u0002\0\u0004\0\u0006\0\n\0돒߮\0\0\u0003\0ā\0\0\0\u0002\0\u0004\0돒߮\f\0\0\0ā\0\0\0\u0002\0\u0004\0돞߮\0\0\0\0ā\0\u0002\0\u0004\0\u0006\0\b\0돞߮\0\0\0\0ā\0\u0002\0\u0004\0돞߮\0\0\0\0āā\u0002\0\u0004\0\u0006\0\b\0\f\0\u0010\0돞߮\0\0\0\0\0\0\0\0\u0002\0\u0004\0\b\0돞߮\0\0\0\0Ā\0\u0004\0\u0006\0돞߮\0\0\0\0ā\0\u0002\0\u0004\0돞߮\0\0\0\0\u0001\0\u0002\0\u0004\0돞߮\0\0\0\0Ā\u0001\u0004\0\u0006\0\b\0돞߮\0\0\u...

wmjordan commented 6 years ago

The screenshot of problem 2.

shot

rpgmaker commented 6 years ago

Thanks for the submission

rpgmaker commented 6 years ago

I would try to work on this this week/weekend. But if you have bandwidth to help me with it, that would be nice too.

Thanks,

wmjordan commented 6 years ago

It was very very strange that the security issue 2 could not be reproduced on my computer. I checked the source code and could not find any clue why the pointer could go wrong. For security issue 1 is quite easy to fix, but performance issues need a little more time.

rpgmaker commented 6 years ago

Maybe the issue is only in the latest nuget package and not the source code?

wmjordan commented 6 years ago

I was forking the latest source code when I met with the above issues.

wmjordan commented 6 years ago

I found fixing the security issue 1 would need passing one more parameter length to the DecodeJSONString method, the problem spread further beyond my first thought.

rpgmaker commented 6 years ago

Length of the string or the hexidecimal \u value?

wmjordan commented 6 years ago

The whole JSON string.

The following code reproduced the issue either:

NetJSON.NetJSON.Deserialize<string>("\"abc")

Since the JSON code was not correctly terminated, the DecodeJSONString method could read beyond the end of the string. NetJSON would report "Input is not a valid JSON.", or read too much until it happen to meet with another code somewhere in the memory.

wmjordan commented 6 years ago

My second thought was that passing a pointer to the end of the string might be more efficient and easier to implement.

wmjordan commented 6 years ago

shot

rpgmaker commented 6 years ago

I could put a check for \0 null pointer value to break out of the infinite loop and only throw that error for non primitive type

Thanks,

rpgmaker commented 6 years ago

Putting in the checks slows than decoding of string. I would like to find a way to know quickly without reading all the string similar to string skipping for property and keys in the dictionary.

rpgmaker commented 6 years ago

using the following

var value = NetJSON.Deserialize<string[]>("[\"\\"ab\"]"); with escape for the "\" will give you the expected array of "ab . Without it, it will throw invalid json due to my current logic since it is depending on the escape of \ because it cannot detect end of the string been evaluate since it is character by character and also because it assume a string starts with " and ends with ".

rpgmaker commented 6 years ago

if you add the logic to check for the "\0" then you will have to magically detect when you are deserializing a regular string and a complex type else the first \" in the string will get swallowed up

rpgmaker commented 6 years ago

I believe i resolved both the current issues. I figured out a quick way to detect plain strings and it context by checking simply for index of zero for the decode method.

Please test with the latest pointing to this commit and let me know if all the issues are resolved: https://github.com/rpgmaker/NetJSON/commit/afdeb40b48b826b689795f75362f59f1a8cafdb4

If it is resolved, please close this ticket.

Thanks,

rpgmaker commented 6 years ago

Problem 1 was solved by adding the check for '\0' in the last commit.

I also added the performance suggestion for number 3-4 here: https://github.com/rpgmaker/NetJSON/commit/e7df7adb7fe0d0c3de089691546a8b95b853fe90

I did not do perf 2 because i will still have to convert the string from hex to number, and it seem easier to use int.parse and collect the string once. Plus, internally, the new string uses the fast alloc and also navigate the pointer smarter to create the string. So it is a little different from the typically "".

rpgmaker commented 6 years ago

if you could implement the perf #2 solution for me. Then i can add it in as a pull request.

Thanks,

wmjordan commented 6 years ago

Great, I've also just learned that .NET strings are both length-prefixed and zero-terminated. Checking the trailing zero char can quickly determine the end of the string, unless that somehow the string contains zero char--it is possible in .NET strings. The alternative is solution is to move the pointer backward by a int* where you can get the length of the string.

var s = "string";
fixed(char* ptr = s) {
   int* l = (char*)ptr;
   --l;
   Assert.IsTrue(*l == s.Length);
}

I will look for some time to see whether I can solve the problem 1 and problem 2.

rpgmaker commented 6 years ago

Thanks for the update.

rpgmaker commented 5 years ago

I am closing this issue since there is a PR for it.

rpgmaker / NetJSON

Potential security and performance problems in DecodeJSONString #184