gregsdennis / Manatee.Json

A fully object-oriented approach to JSON manipulation, validation, and serialization that focuses on modeling the JSON structure rather than mere string parsing and conversion.
MIT License
198 stars 32 forks source link

Concurrent validation of the same schema #271

Closed jack-ohara closed 4 years ago

jack-ohara commented 4 years ago

Describe the bug We have individual tests which check two separate APIs return a result that adheres to the same schema. When these tests are run in parallel, the ValidateSchema() method throws an exception. The exception is either of the below:

System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection. (Parameter 'index')
   at System.Collections.Generic.List`1.RemoveAt(Int32 index)
   at System.Collections.Generic.List`1.Remove(T item)
   at Manatee.Json.Schema.RecursiveRefKeyword.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.AdditionalPropertiesKeyword.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.PropertiesKeyword.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.VocabularyKeyword.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.ValidateSchema(JsonSchemaOptions options)
   at ManateeConcurrencyTests.UnitTest1.<>c.<Test1>b__0_0() in C:\\Users\\Jack.ohara\\source\\repos\\ManateeConcurrencyTests\\ManateeConcurrencyTests\\UnitTest1.cs:line 24\r\n   at ManateeConcurrencyTests.UnitTest1.<>c__DisplayClass0_0.<Test1>b__1(Action action) in C:\\Users\\Jack.ohara\\source\\repos\\ManateeConcurrencyTests\\ManateeConcurrencyTests\\UnitTest1.cs:line 38
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
   at System.Linq.Enumerable.Any[TSource](IEnumerable`1 source, Func`2 predicate)
   at Manatee.Json.Schema.RecursiveRefKeyword.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.AdditionalPropertiesKeyword.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.PropertiesKeyword.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.VocabularyKeyword.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.Validate(SchemaValidationContext context)
   at Manatee.Json.Schema.JsonSchema.ValidateSchema(JsonSchemaOptions options)
   at ManateeConcurrencyTests.UnitTest1.<>c.<Test1>b__0_0() in C:\\Users\\Jack.ohara\\source\\repos\\ManateeConcurrencyTests\\ManateeConcurrencyTests\\UnitTest1.cs:line 24\r\n   at ManateeConcurrencyTests.UnitTest1.<>c__DisplayClass0_0.<Test1>b__1(Action action) in C:\\Users\\Jack.ohara\\source\\repos\\ManateeConcurrencyTests\\ManateeConcurrencyTests\\UnitTest1.cs:line 38

To Reproduce I have written a unit test that reproduces the error:

var validateSchemaAction = new Action(()
{
    var serializer = new JsonSerializer();
    var schemaText = File.ReadAllText($"{AppDomain.CurrentDomain.BaseDirectory}my-schema.json");
    var schemaJson = JsonValue.Parse(schemaText);
    var schema = serializer.Deserialize<JsonSchema>(schemaJson);

    schema.ValidateSchema();
});

var exceptions = new List<Exception>();

Parallel.ForEach(
    new[]
    {
        validateSchemaAction, validateSchemaAction, validateSchemaAction, validateSchemaAction,
        validateSchemaAction
    }, action =>
    {
        try
        {
            action();
        }
        catch (Exception ex)
        {
            exceptions.Add(ex);
        }
    });

Assert.Empty(exceptions);

The schema I was using for this test is:

{
  "description": "API Error",
  "type": "object",
  "required": [ "code", "message" ],
  "properties": {
    "code": { "type": "string" },
    "message": { "type": "string" },
    "correlation_id": { "type": "string" },
    "details": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "property": { "type": "string" },
          "code": { "type": "string" },
          "message": { "type": "string" }
        }
      }
    },
    "links": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "href": { "type": "string" },
          "rel": { "type": "string" },
          "reference": { "type": "string" },
          "type": { "type": "string" }
        }
      }
    }
  }
}

Expected behavior The schema should be validated successfully on all threads.

Desktop (please complete the following information):

gregsdennis commented 4 years ago

Thanks for the report and the sample test. I'll add it to the suite so I can fix the problem.

gregsdennis commented 4 years ago

I find it odd that there's no $recursiveRef in your schema, yet this is the keyword that's throwing. That leads me to believe it's happening in the meta-schema validation, but that's not part of the stack trace.

evman182 commented 4 years ago

It looks like it is from the MetaSchema Validation. The part of the trace in VocabularyKeyword is line 127. I'm guessing the issue is that the meta schemas are shared and are coming from a static ConcurrentDictionary.

gregsdennis commented 4 years ago

Hm..., yeah it's here: https://github.com/gregsdennis/Manatee.Json/blob/master/Manatee.Json/Schema/Keywords/RecursiveRefKeyword.cs#L72

So this keyword instance is on the metaschema, which is attempting to validate concurrently. This particular array is used to detect recursive loops in the schema by storing the instance locations that it's already evaluated.

Even if it didn't throw an exception, it's still not thread-safe because it could store a location from evaluating that same location on another instance on another thread.

I have a few solutions in mind. I'll play with those and see what comes of them.

jack-ohara commented 4 years ago

@gregsdennis thanks for the fix! 👍

olivercoad commented 4 years ago

Just came across this bug while validating the schema and validating objects in another test case.

Went to file an issue but was delighted to find it already fixed 👍. Thanks!

karenetheridge commented 4 years ago

FWIW -- I've come across similar issues in other JSON Schema evaluators, and have come to the conclusion that storing any sort of state information right in the schema structure is a Bad Idea with the potential to create all kinds of hidden bugs, and so have switched to storing state info in an out-of-band structure that only the current evaluator instance has access to in its own thread.