rpgmaker / NetJSON

Faster than Any Binary? Benchmark: http://theburningmonk.com/2014/08/json-serializers-benchmarks-updated-2/
MIT License
225 stars 29 forks source link

Invalid JSON causes full CPU usage (infinite loop) #226

Closed Micke90s closed 4 years ago

Micke90s commented 4 years ago

Hallo,

could you please check the following behavior?

When calling “Deserialize” for an invalid JSON-String (missing double quote) the current CPU core will run at 100%. The only way to exit is closing the complete program. Would it be possible to get an exception in this case?

public class Dummy  {
    public string Value { get; set; }
    public string Regex { get; set; }
  }
  class Program  {
    static void Main(string[] args)    {
      Console.WriteLine("Define invalid JSON");
      //var json = "{\"Value\":\"\",\"Regex\":false}"; //good JSON
      var json = "{Value\":\"\",\"Regex\":false}"; //bad JSON
      Console.WriteLine("Run Deserialize");
      var simple2 = NetJSON.NetJSON.Deserialize<Dummy>(json);
      Console.WriteLine("Done");
    }
  }
}
rpgmaker commented 4 years ago

Thanks for the find. I can add an additional check for the existing invalid json exception. Will take a look this evening. Thanks

rpgmaker commented 4 years ago

It will be tough to cover every case. I can count the numbers of quotes from open to close of each variable and make sure it is multiple of 2. But the other issue I am finding is that you can also have invalid json by not filling in the proper value for string data type. I will do more research to see what other do for validation and try to apply that.

Thanks,

rpgmaker commented 4 years ago

The solution of counting quotes does not work either. Since there can be quotes in quotes that are escaped. I was hoping that i can detect the escaped characters which will lead to not counting them then i could simply make sure that the numbers of quotes is evenly divided by 2. Still trying out other ideas without having to create a validator that will be expensive to call before each deserialization.

Micke90s commented 4 years ago

Thank you for your effort.

Just as an idea. How about try to detect the infinite loop? If there is a infinite loop the JSON is invalid. So you may could avoid to use an expensive validation.

May a solution like check the string length would be possible?

    if ( nextCharacterPosition > JSONStringLength )
        raise Exception;
rpgmaker commented 4 years ago

The problem is that I will have to add that to a lot of places. There is one in the main loop that checks for \0 already. But I have different loops that handles type conversion.
I will give it a try this evening and let you know if it works.

Thanks,

rpgmaker commented 4 years ago

I tried to do something similar to what you said by checking if i have null character when it overflows but that did not work. I need to debug it more to see what the source of infinite loop is.

// Throw invalid json if needed il.Emit(OpCodes.Ldc_I4, (int)'\0'); il.Emit(OpCodes.Ldloc, current); il.Emit(OpCodes.Bne_Un, invalidIfNull); il.Emit(OpCodes.Newobj, _invalidJSONCtor); il.Emit(OpCodes.Throw); il.MarkLabel(invalidIfNull);

rpgmaker commented 4 years ago

Do you have other ideas to solve this issue?

Micke90s commented 4 years ago

Unfortunately, I have no experience about using the ILGenerator classes. So, its hard to understand the dynamic source code generation.

Is there a way to detect or debug where the infinity loop occurs in the dynamic source code? I think to know the exact point would help to solve the problem. Is it possible to print the generated source code to console?

Are there other possible types of infinity loops? At the moment you tried to fix an infinity loop which could occur if reading the JSON string over the available length. Are there any other escape conditions e.g. “[{}]”? May this cause that the fix is not reached? Also, it would be possible that the loop occurs based on a switching pointer. Method one increases the pointer method two decreases it. Is there some coding which may could cause this?

wmjordan commented 4 years ago

@Micke90s It is not so "viable" to print the generated source code to console. Dynamic assemblies do not have source code.

@rpgmaker By downloading the latest source code and using @Micke90s 's code snippet as a test method, I reproduced the problem. I also found that the infinite loop has something to do with the IsCurrentAQuot method which is called by only one site: GenerateGetClassOrDictStringType line 5538. It appears that IsCurrentAQuot is called infinitely when dealing with the malformed JSON code. Your checking of terminating zero character doesn't work since the pointer does not appear to have ever left the string. Please check the code around there and see what's wrong.

wmjordan commented 4 years ago

After some more debugging, I thought that my previous assumption was not so correct. The fact was that the HasOverrideQuoteChar property was being infinitely accessed, when I broke into the debugger. The false was that IsCurrentAQuot method had something to do with the infinite loop. I saw that method uses that property and made it a suspect. However, when I set a break point within that method, it was not called as the infinite loop was going on. Thus we can wipeIsCurrentAQuot out of the suspects. And I found that the HasOverrideQuoteChar property can be called in multiple places, but all within the GenerateGetClassOrDictStringType method, fortunately. Maybe you shall check around those reference points.

rpgmaker commented 4 years ago

Thanks for digging deeper into the issue. I will look around that logic. I initial added the code above in the logic for the while loop that is generated for the ILFixed method that is used by both the list and dictionary method to check for the null character.

As for dumping the code out. You can set NetJSON.GenerateAssembly to true and it will create a dll with the serialization logic, and then I guess you can reference it manually in another test project. But not sure it will be debuggable without the debug symbol not been generated too.

rpgmaker commented 4 years ago

Sorry i have not worked much on this issue. I will let you know once i get some work done on it. unless @wmjordan wants to take care of it.

wmjordan commented 4 years ago

Neither do I have the time at this moment, unfortunately :P

rpgmaker commented 4 years ago

Np. I will take care of it.

rpgmaker commented 4 years ago

Been very busy and have not got a chance to look at this at the moment.

rpgmaker commented 4 years ago

@wmjordan , I figured it out. The solution was around the quotes.

Adding the code above this line fixes the issue: https://github.com/rpgmaker/NetJSON/blob/master/NetJSON/NetJSON.cs#L5636


 // Check quotes count if less than 2 then missing quotes (Throw Exception for missing quote)
            il.Emit(OpCodes.Ldc_I4_2);
            il.Emit(OpCodes.Ldloc, quotes);
            il.Emit(OpCodes.Bgt, inCompleteQuoteLabel);

            il.Emit(OpCodes.Newobj, _invalidJSONCtor);
            il.Emit(OpCodes.Throw);

            il.MarkLabel(inCompleteQuoteLabel);
rpgmaker commented 4 years ago

@Micke90s, can you please test your code again using the latest master code? I added similar test as your test which had the quote missing completely for the first property.

wmjordan commented 4 years ago

Oh, good to see this issue has gotten fixed.

Micke90s commented 4 years ago

@rpgmaker, thank you for your effort. Unfortunately, the example code from January still not work on my system with the current master code.

rpgmaker commented 4 years ago

So you are running the above code? Can you run the unit tests in the project itself? Do they fail?

Micke90s commented 4 years ago

The unit test project works fine: grafik

But the test project still is not able to finish. Running the following code should exit with an uncaught exception but nothing happens. The program still hangs like before.

using System;

namespace Test {
  public class Dummy {
    public string Value { get; set; }
    public string Regex { get; set; }
  }
  class Program {
    static void Main(string[] args) {
      Console.WriteLine("Define invalid JSON");
      //var json = "{\"Value\":\"\",\"Regex\":false}"; //good JSON
      var json = "{Value\":\"\",\"Regex\":false}"; //bad JSON
      Console.WriteLine("Run Deserialize");
      var simple2 = NetJSON.NetJSON.Deserialize<Dummy>(json);
      Console.WriteLine("Done");
    }
  }
}

The test project is a simple console application using .NET Framework 4.7.2

Is there anything I could check further?

rpgmaker commented 4 years ago

Great you mentioned 4.7.2. Maybe I need to test with that.

rpgmaker commented 4 years ago

Just verified that I am using 4.7.2 for the test already.

rpgmaker commented 4 years ago

I figured out the problem. My changes only fixed the code path for dictionary only and did not do same for using classes. If you switch to use Dictionary<string, object> then it will work and throw an exception. Let me make sure i have the code changes in both path.

rpgmaker commented 4 years ago

I just realize there is even a bigger problem now. It cause other failure if i try to inverse the condition check and breaks at least 53 tests. I need to figure out what is going on and find a better solution to what i currently have

rpgmaker commented 4 years ago

It should work now and i also added your test with the dummy class. I have fixed the cause of the problem. Test the master branch and let me know @Micke90s

Micke90s commented 4 years ago

@rpgmaker Now it works perfect. Thank you. @rpgmaker @wmjordan Thank you very much for your effort.

rpgmaker commented 4 years ago

I will try to publish it as a nuget package tomorrow when I get a chance. Thanks

rpgmaker commented 4 years ago

@Micke90s , i have publish the changes as 1.3.4. Should be available in nuget soon once it get indexed