metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
71 stars 34 forks source link

Optionally set key and/or value column. #471

Closed blackwinter closed 1 year ago

blackwinter commented 1 year ago

References hbz/lobid-resources#1461.

TobiasNx commented 1 year ago

Is this then also usable in metafacture-fix?

blackwinter commented 1 year ago

Only after we add the corresponding options to put_filemap().

blackwinter commented 1 year ago

Seems good to me!

Thanks, but wasn't ready for review yet. Can you have another look?

@TobiasNx: Can you do the functional review?

TobiasNx commented 1 year ago

I will do an functional review.

blackwinter commented 1 year ago

Can you update ./metamorph/src/main/resources/schemata/metamorph.xsd ?

Will do, thanks.

This is the scheme helping e.g. XML editors validating a Morph.

Not only editors, Metamorph itself uses it. I just didn't notice since I had foregone the Morph tests.

TobiasNx commented 1 year ago

I am still not sure about where to add the options:

    <maps>
        <filemap name="animals" files="multipleColumnLookup/animals.tsv" separator="\t" expectedColumns="-1"/>
    </maps>
    <rules>
        <data source="animal">
            <lookup in="animals" />
        </data>
    </rules>

I cannot run this yet: https://github.com/TobiasNx/notWorkingFlux/tree/main/multipleColumnLookup

Results in :

{ }
{ }
{ }

Expected:

{"animal":"Mammal"}
{"animal":"Bird"}
{"animal":"Insect"}
blackwinter commented 1 year ago

You have to specify the separator \t as character entity &#09; (or leave it off as it's the default anyway).

blackwinter commented 1 year ago

I am still not sure about where to add the options:

Options go on the filemap specification; Metamorph doesn't have the lookup "shortcut".

TobiasNx commented 1 year ago

Okay, i tested it. It seems to work. Great. +1

Since I played around with the same file in different directions with sometimes duplicated values what I would recommend is to document that always the last match is selected: https://github.com/TobiasNx/notWorkingFlux/blob/60a00e538569527d88b69cc518d61b4b9ab9410e/multipleColumnLookup

blackwinter commented 1 year ago

But document the selection of last match.

Do you want to provide a suggestion? This behaviour has not changed, it's always been like this.