metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
69 stars 34 forks source link

_elseNested loses array-key in JSON #374

Closed TobiasNx closed 2 years ago

TobiasNx commented 3 years ago

We have following FLUX:

"testArray.json"
| open-file
| as-records
| decode-json
| morph("all.xml")
| encode-json(prettyPrinting="true")
| write("stdout");

the morph is:

<?xml version="1.0" encoding="UTF-8"?>
<metamorph xmlns="http://www.culturegraph.org/metamorph" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    version="1">
    <rules>
<data source="_elseNested"/>

</rules>
</metamorph>

The incoming JSON is e.g.:

{
    "author": [
        {
            "@type": "Person",
            "name": "Katja Königstein-Lüdersdorff"
        },
        {
            "@type": "Person",
            "name": "Corinna Peters"
        },
        {
            "@type": "Person",
            "name": "Oleg Tjulenev"
        },
        {
            "@type": "Person",
            "name": "Claudia Vogeler"
        }
    ]
}

It results in:

{
  "1" : {
    "@type" : "Person",
    "name" : "Katja Königstein-Lüdersdorff"
  },
  "2" : {
    "@type" : "Person",
    "name" : "Corinna Peters"
  },
  "3" : {
    "@type" : "Person",
    "name" : "Oleg Tjulenev"
  },
  "4" : {
    "@type" : "Person",
    "name" : "Claudia Vogeler"
  }
}

Without the morph it works:

With flux:

"testArray.json"
| open-file
| as-records
| decode-json
| encode-json(prettyPrinting="true")
| write("stdout");

Result is:

{
  "author" : [ {
    "@type" : "Person",
    "name" : "Katja Königstein-Lüdersdorff"
  }, {
    "@type" : "Person",
    "name" : "Corinna Peters"
  }, {
    "@type" : "Person",
    "name" : "Oleg Tjulenev"
  }, {
    "@type" : "Person",
    "name" : "Claudia Vogeler"
  } ]
}

The fault is due to _elseNested. With "_elseFlattened" we receive following result:

{
  "author[].1.@type" : "Person",
  "author[].1.name" : "Katja Königstein-Lüdersdorff",
  "author[].2.@type" : "Person",
  "author[].2.name" : "Corinna Peters",
  "author[].3.@type" : "Person",
  "author[].3.name" : "Oleg Tjulenev",
  "author[].4.@type" : "Person",
  "author[].4.name" : "Claudia Vogeler"
}

I assume there is a conflict with the [] as the sign for an array in combination with the nested transformation.

blackwinter commented 2 years ago

The root cause seems to be the same as #378. Should be fixed by #392.

TobiasNx commented 2 years ago

The result with the fix of #392:

{
  "author" : [ {
    "@type" : "Person",
    "name" : "Katja Königstein-Lüdersdorff"
  }, {
    "@type" : "Person",
    "name" : "Corinna Peters"
  }, {
    "@type" : "Person",
    "name" : "Oleg Tjulenev"
  }, {
    "@type" : "Person",
    "name" : "Claudia Vogeler"
  } ]
}

Seems fine now ! Great.

blackwinter commented 2 years ago

With _elseNested, entities are only output when an unhandled literal occurs. The issue was that then only the last (current) entity was taken into account. So, in this example, the outermost entity (author[]) was dropped (regardless of the array marker). This is now fixed.