lutaml / lutaml-model

LutaML Model is the Ruby data modeler part of the LutaML data modeling suite. It supports creating serialization object models (XML, YAML, JSON, TOML) and mappings to and from them.
Other
1 stars 0 forks source link

Add "diff" functionality between objects of same class #18

Open ronaldtse opened 2 months ago

ronaldtse commented 2 months ago

Often I want to compare model objects to see if two objects are identical.

The definition of two model objects being identical is:

I have written a comparator for Shale-based classes in the loc_mods gem.

These are the relevant files:

This task is to port the comparison functionality to lutaml-model, so that objects using lutaml-model are comparable by default.

ronaldtse commented 4 weeks ago

Example:

Extract this directory which gives you test and test2 directories: Archive.zip

Run in loc_mods:

$ bundle exec exe/loc-mods detect-duplicates test/
Duplicate set #1 found for URL: https://doi.org/10.6028/NIST.TN.1630
  Comparison 1:
  File 1: test/allrecords-MODS-991000002489708106.xml
  File 2: test/allrecords-MODS-991000091869708106.xml
  Differences:
  └── LocMods::Record
      └── record_info (collection):
          └── [1] (LocMods::RecordInfo)
              └── [1] (LocMods::RecordIdentifier)
                  └── content (Shale::Type::String):
                      ├── - (String) "991000002489708106"
                      └── + (String) "991000091869708106"
  Similarity score: 99.91%
$ bundle exec exe/loc-mods detect-duplicates test2
Duplicate set #1 found for URL: https://doi.org/10.6028/NIST.IR.6659
  Comparison 1:
  File 1: test2/allrecords-MODS-991000009289708106.xml
  File 2: test2/allrecords-MODS-991000179879708106.xml
  Differences:
  └── LocMods::Record
      ├── identifier (collection):
      │   └── - [2] (LocMods::Identifier)
      │       ├── content (Shale::Type::String):
      │       │   └── (String) "994303379"
      │       ├── display_label (Shale::Type::String):
      │       │   └── (nil)
      │       ├── type (Shale::Type::String):
      │       │   └── (String) "oclc"
      │       ├── type_uri (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── invalid (Shale::Type::Value):
      │       │   └── (nil)
      │       └── alt_rep_group (Shale::Type::String):
      │           └── (nil)
      ├── note (collection):
      │   ├── [2] (LocMods::Note)
      │   │   └── content (Shale::Type::String):
      │   │       ├── - (String) "July 1, 2010."
      │   │       └── + (String) "2010."
      │   └── [3] (LocMods::Note)
      │       └── content (Shale::Type::String):
      │           ├── - (String) "Title from PDF title page (viewed June 5, 2017)."
      │           └── + (String) "Title from PDF title page."
      ├── origin_info (collection):
      │   └── [2] (LocMods::OriginInfo)
      │       ├── [1] (LocMods::Place)
      │       │   └── [1] (LocMods::PlaceTerm)
      │       │       └── content (Shale::Type::String):
      │       │           ├── - (String) "Gaithersburg, MD:"
      │       │           └── + (String) "Gaithersburg, MD :"
      │       └── [1] (LocMods::Publisher)
      │           └── content (Shale::Type::String):
      │               ├── - (String) "U.S. Dept. of Commerce, National Institute of Standards and
                Technology;"
      │               └── + (String) "U.S. Dept. of Commerce, National Institute of Standards and
                Technology; "
      ├── record_info (collection):
      │   └── [1] (LocMods::RecordInfo)
      │       ├── [1] (LocMods::Date)
      │       │   └── content (Shale::Type::String):
      │       │       ├── - (String) "190912"
      │       │       └── + (String) "160922"
      │       ├── [1] (LocMods::Date)
      │       │   └── content (Shale::Type::String):
      │       │       ├── - (String) "20200114084208.0"
      │       │       └── + (String) "20160922100247.0"
      │       └── [1] (LocMods::RecordIdentifier)
      │           └── content (Shale::Type::String):
      │               ├── - (String) "991000009289708106"
      │               └── + (String) "991000179879708106"
      ├── subject (collection):
      │   ├── - [1] (LocMods::Subject)
      │   │   ├── id (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── authority (Shale::Type::String):
      │   │   │   └── (String) "lcsh"
      │   │   ├── authority_uri (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── value_uri (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── lang (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── script (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── transliteration (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── display_label (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── alt_rep_group (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── usage (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── topic (Shale::Type::String):
      │   │   │   └── (Array) [(String) "Jet planes", (String) "Fuel", (String) "Thermal properties"]
      │   │   ├── geographic (Shale::Type::String):
      │   │   │   └── (Array) 0 items
      │   │   ├── temporal (LocMods::Temporal):
      │   │   │   └── (Array) 0 items
      │   │   ├── title_info (LocMods::SubjectTitleInfo):
      │   │   │   └── (Array) 0 items
      │   │   ├── name (LocMods::SubjectName):
      │   │   │   └── (Array) 0 items
      │   │   ├── geographic_code (LocMods::GeographicCode):
      │   │   │   └── (Array) 0 items
      │   │   ├── hierarchical_geographic (LocMods::HierarchicalGeographic):
      │   │   │   └── (Array) 0 items
      │   │   ├── cartographics (LocMods::Cartographics):
      │   │   │   └── (Array) 0 items
      │   │   ├── occupation (LocMods::Occupation):
      │   │   │   └── (Array) 0 items
      │   │   ├── genre (LocMods::Genre):
      │   │   │   └── (Array) 0 items
      │   │   └── href (Shale::Type::String):
      │   │       └── (String) "https://id.loc.gov/authorities/subjects/sh2001009121"
      │   ├── - [2] (LocMods::Subject)
      │   │   ├── id (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── authority (Shale::Type::String):
      │   │   │   └── (String) "fast"
      │   │   ├── authority_uri (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── value_uri (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── lang (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── script (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── transliteration (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── display_label (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── alt_rep_group (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── usage (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── topic (Shale::Type::String):
      │   │   │   └── (Array) [(String) "Jet planes", (String) "Fuel", (String) "Thermal properties"]
      │   │   ├── geographic (Shale::Type::String):
      │   │   │   └── (Array) 0 items
      │   │   ├── temporal (LocMods::Temporal):
      │   │   │   └── (Array) 0 items
      │   │   ├── title_info (LocMods::SubjectTitleInfo):
      │   │   │   └── (Array) 0 items
      │   │   ├── name (LocMods::SubjectName):
      │   │   │   └── (Array) 0 items
      │   │   ├── geographic_code (LocMods::GeographicCode):
      │   │   │   └── (Array) 0 items
      │   │   ├── hierarchical_geographic (LocMods::HierarchicalGeographic):
      │   │   │   └── (Array) 0 items
      │   │   ├── cartographics (LocMods::Cartographics):
      │   │   │   └── (Array) 0 items
      │   │   ├── occupation (LocMods::Occupation):
      │   │   │   └── (Array) 0 items
      │   │   ├── genre (LocMods::Genre):
      │   │   │   └── (Array) 0 items
      │   │   └── href (Shale::Type::String):
      │   │       └── (String) "https://id.worldcat.org/fast/982434"
      │   └── - [3] (LocMods::Subject)
      │       ├── id (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── authority (Shale::Type::String):
      │       │   └── (String) "fast"
      │       ├── authority_uri (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── value_uri (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── lang (Shale::Type::String):
      │       │   └── (nil)
      │       ├── script (Shale::Type::String):
      │       │   └── (nil)
      │       ├── transliteration (Shale::Type::String):
      │       │   └── (nil)
      │       ├── display_label (Shale::Type::String):
      │       │   └── (nil)
      │       ├── alt_rep_group (Shale::Type::String):
      │       │   └── (nil)
      │       ├── usage (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── topic (Shale::Type::String):
      │       │   └── (Array) [(String) "Jet planes", (String) "Fuel"]
      │       ├── geographic (Shale::Type::String):
      │       │   └── (Array) 0 items
      │       ├── temporal (LocMods::Temporal):
      │       │   └── (Array) 0 items
      │       ├── title_info (LocMods::SubjectTitleInfo):
      │       │   └── (Array) 0 items
      │       ├── name (LocMods::SubjectName):
      │       │   └── (Array) 0 items
      │       ├── geographic_code (LocMods::GeographicCode):
      │       │   └── (Array) 0 items
      │       ├── hierarchical_geographic (LocMods::HierarchicalGeographic):
      │       │   └── (Array) 0 items
      │       ├── cartographics (LocMods::Cartographics):
      │       │   └── (Array) 0 items
      │       ├── occupation (LocMods::Occupation):
      │       │   └── (Array) 0 items
      │       ├── genre (LocMods::Genre):
      │       │   └── (Array) 0 items
      │       └── href (Shale::Type::String):
      │           └── (String) "https://id.worldcat.org/fast/982425"
      └── title_info (collection):
          └── [1] (LocMods::TitleInfo)
              └── [1] (String)
                  └── :
                      ├── - (String) "Thermodynamic, transport, and chemical properties of reference
                JP-8"
                      └── + (String) "Thermodynamic, transport, and chemical properties of reference JP-8
"
  Similarity score: 92.75%

In Lutaml::Model, since we don't know the type of object parsed, we have to generalize it so that the user is able to define the class and then parse files and then do the comparison to print out such a tree.

ronaldtse commented 3 weeks ago

The compare code is done in #34

Need to provide documentation on how to use the compare.