metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
71 stars 34 forks source link

_elseNested only outputs two hierachy levels #378

Closed TobiasNx closed 3 years ago

TobiasNx commented 3 years ago

If operating with a file with more than 2 hierachy levels _elseNested reduces this structure to 2 levels:

Input:

<?xml version="1.0" encoding="UTF-8"?>
<records>
    <record>
        <mods>
            <ID>duepublico_mods_00074526</ID>
            <name>
                <type>personal</type>
                <type>simple</type>
                <displayForm>
                    <value>Armbruster, André</value>
                </displayForm>
                <role>
                    <roleTerm>
                        <authority>marcrelator</authority>
                        <type>code</type>
                        <value>aut</value>
                    </roleTerm>
                    <roleTerm>
                        <authority>marcrelator</authority>
                        <type>text</type>
                        <value>Author</value>
                    </roleTerm>
                </role>
                <nameIdentifier>
                    <type>gnd</type>
                    <value>1081830107</value>
                </nameIdentifier>
                <namePart>
                    <type>family</type>
                    <value>Armbruster</value>
                </namePart>
                <namePart>
                    <type>given</type>
                    <value>André</value>
                </namePart>
            </name>
        </mods>
    </record>
</records>

FLUX:

default infile = FLUX_DIR + "mods.xml";

infile
| open-file
| decode-xml
| handle-generic-xml
| morph(FLUX_DIR + "all.xml")
| encode-xml
| write(FLUX_DIR + "resultNEsted.xml")
;

MORPH:

<?xml version="1.0" encoding="UTF-8"?>
<metamorph xmlns="http://www.culturegraph.org/metamorph" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    version="1">
    <rules>
<data source="_elseNested"/>

</rules>
</metamorph>

OUTPUT:

<?xml version="1.0" encoding="UTF-8"?>
<records>

    <record>
        <ID>
            <value>duepublico_mods_00074526</value>
        </ID>
        <type>
            <value>personal</value>
        </type>
        <type>
            <value>simple</value>
        </type>
        <value>
            <value>Armbruster, André</value>
        </value>
        <authority>
            <value>marcrelator</value>
        </authority>
        <type>
            <value>code</value>
        </type>
        <value>
            <value>aut</value>
        </value>
        <authority>
            <value>marcrelator</value>
        </authority>
        <type>
            <value>text</value>
        </type>
        <value>
            <value>Author</value>
        </value>
        <type>
            <value>gnd</value>
        </type>
        <value>
            <value>1081830107</value>
        </value>
        <type>
            <value>family</value>
        </type>
        <value>
            <value>Armbruster</value>
        </value>
        <type>
            <value>given</value>
        </type>
        <value>
            <value>André</value>
        </value>
    </record>

</records>

If decoded as JSON:

{
    "ID": {
        "value": "duepublico_mods_00074526"
    },
    "type": {
        "value": "personal"
    },
    "type": {
        "value": "simple"
    },
    "value": {
        "value": "Armbruster, André"
    },
    "authority": {
        "value": "marcrelator"
    },
    "type": {
        "value": "code"
    },
    "value": {
        "value": "aut"
    },
    "authority": {
        "value": "marcrelator"
    },
    "type": {
        "value": "text"
    },
    "value": {
        "value": "Author"
    },
    "type": {
        "value": "gnd"
    },
    "value": {
        "value": "1081830107"
    },
    "type": {
        "value": "family"
    },
    "value": {
        "value": "Armbruster"
    },
    "type": {
        "value": "given"
    },
    "value": {
        "value": "André"
    }
}
hagbeck commented 3 years ago

I can confirm this issue in another case.

<record>
    <contribution>
        <type>Contribution</type>
        <agent>
            <label>Halfbrodt, Michael</label>
            <type>Person</type>
            <gndIdentifier>1038509653</gndIdentifier>
            <id>https://d-nb.info/gnd/1038509653</id>
        </agent>
    </contribution>
</record>

results in

<record>
    <type>
        <value>Contribution</value>
    </type>
    <label>
        <value>Halfbrodt, Michael</value>
    </label>
    <type>
        <value>Person</value>
    </type>
    <gndIdentifier>
        <value>1038509653</value>
    </gndIdentifier>
    <id>
        <value>https://d-nb.info/gnd/1038509653</value>
    </id>
</record>

A solution would be nice.

TobiasNx commented 3 years ago

For a possible solution repeated fields/subfields should not be overwritten. flattend is doing this

dr0i commented 3 years ago

ping @blackwinter , if you want to have a look

blackwinter commented 3 years ago

I'd like to, but I don't know when I will find the time...

It might be helpful if someone could prepare a (failing) test case in org.metafacture.metamorph.TestMetamorphBasics.

TobiasNx commented 3 years ago

I have set up a sample repo: https://github.com/TobiasNx/notWorkingFlux/tree/main/elseNested_Xml2JSON @dr0i could create the test from this

blackwinter commented 3 years ago

I've looked into this a bit over the weekend. Apparently, it's not like this hadn't been anticipated (sort of):

Also, there's the question of nested entities (more than one entity marker in the path), but I won't get into it here...

_Originally posted by @blackwinter in https://github.com/metafacture/metafacture-core/pull/333#discussion_r503248019_

Oops... :innocent:

I'll try to come up with a PR in the next couple of days.

blackwinter commented 3 years ago

With _elseNested, entities are only output when an unhandled literal occurs. The issue was that then only the last (current) entity was taken into account. Now all intermediate entities are included as well.