microsoft / jschema

Includes an implementation of JSON Schema Draft 4, an implementation of JSON pointer, and a JSON-schema-to-C# code generator
Apache License 2.0
120 stars 28 forks source link

Add Comparer code generator #146

Closed yongyan-gh closed 2 years ago

yongyan-gh commented 2 years ago

Description

Add comparer code generator to generate IComparer implementation for JSON entities. This enables deterministic sorting of log files, which is helpful in bytewise diff comparisons and other scenarios.

Now JSchema generated entities and corresponding equalityComparer classes:

The generated codes from SARIF schema in attached zip file. Autogenerated.zip

Have tested in my test Sarif.sdk branches using the new generated code, the existing comparer tests passed, and sorting results look good. This is the branch I tested using new generated code: https://github.com/microsoft/sarif-sdk/tree/users/yongyan-gh/newComparersTests

This is for item #141.

michaelcfanning commented 2 years ago
        // }

Did you change the code emit? Is this comment out of sync?


In reply to: 1098548781


Refers to: src/Json.Schema.ToDotNet/EqualityComparerGenerator.cs:642 in abf9d24. [](commit_id = abf9d243172ddb39f314b168ca7e439dc8ede334, deletion_comment = False)

michaelcfanning commented 2 years ago

I thought about this more from the customer standpoint. The customer will be potentially reading our SARIF log files and so it would be helpful if the ordering matched some expected/fact that supports readability.

For example, the 'rules' table should first be sorted by rule id and/or rule name. Generally, URLs, particularly the relative paths, are useful for sorting. This allows people to navigate a log file easily.

I assume our autogenerated ordering is following the order in the schema? Or is it taking some other approach? i.e., what's our guarantee that the autogenerated code itself produces the same comparer run-over-run?


In reply to: 941636879

yongyan-gh commented 2 years ago

I thought about this more from the customer standpoint. The customer will be potentially reading our SARIF log files and so it would be helpful if the ordering matched some expected/fact that supports readability.

For example, the 'rules' table should first be sorted by rule id and/or rule name. Generally, URLs, particularly the relative paths, are useful for sorting. This allows people to navigate a log file easily.

I assume our autogenerated ordering is following the order in the schema? Or is it taking some other approach? i.e., what's our guarantee that the autogenerated code itself produces the same comparer run-over-run?

In reply to: 941636879

(not able to comment from codeflow, replying in GitHub) The order is based on DeclarationOrder properties of PropertyInfo. (https://github.com/microsoft/jschema/blob/169283958c972aef3d8c55fdab552ac276cfb6a3/src/Json.Schema.ToDotNet/PropertyInfoDictionary.cs#L102-L110) But in the SARIF Json Schema file, DeclarationOrder is not set, so its following the order of property in properties of each entities. E.g. the translationMetadata entity below, order of property is name -> fullName -> shortDescription -> fullDescription etc... Both equality comparer and comparer generator follow the same order. If schema does not change, the generator should produce the same comparer code. If property order changes, or new properties need to be added to schema not at the end of property list, may need to define DeclarationOrder in the schema file.

    "translationMetadata": {
      "description": "Provides additional metadata related to translation.",
      "type": "object",
      "additionalProperties": false,
      "properties": {

        "name": {
          "description": "The name associated with the translation metadata.",
          "type": "string"
        },

        "fullName": {
          "description": "The full name associated with the translation metadata.",
          "type": "string"
        },

        "shortDescription": {
          "description": "A brief description of the translation metadata.",
          "$ref": "#/definitions/multiformatMessageString"
        },

        "fullDescription": {
          "description": "A comprehensive description of the translation metadata.",
          "$ref": "#/definitions/multiformatMessageString"
        },

        "downloadUri": {
          "description": "The absolute URI from which the translation metadata can be downloaded.",
          "type": "string",
          "format": "uri"
        },

        "informationUri": {
          "description": "The absolute URI from which information related to the translation metadata can be downloaded.",
          "type": "string",
          "format": "uri"
        },

        "properties": {
          "description": "Key/value pairs that provide additional information about the translation metadata.",
          "$ref": "#/definitions/propertyBag"
        }
      },
      "required": [ "name" ]
    },

In reply to: 1103253931

yongyan-gh commented 2 years ago

The DeclarationOrder is not a JSON schema supported property. It's a property in our own PropertyInfo class. The code generators use the DeclarationOrder for the order of properties generated. The DeclarationOrder value is set to the order of the properties appear in the JSON schema. So if we d like to change the order of properties in the generated code, we can change the order of properties defined in JSON schema file.


In reply to: 1103253931

yongyan-gh commented 2 years ago
        // }

no change for this method. only changed the method dealing with dictionary. see comments below in method: GenerateDictionaryHashCodeContribution


In reply to: 1098548781


Refers to: src/Json.Schema.ToDotNet/EqualityComparerGenerator.cs:642 in abf9d24. [](commit_id = abf9d243172ddb39f314b168ca7e439dc8ede334, deletion_comment = False)