metafacture / metafacture-fix

Work in progress towards an implementation of the Fix language for Metafacture
Apache License 2.0
6 stars 2 forks source link

`replace_all` with asterisk with optional subfield creates new subfield #377

Open TobiasNx opened 1 month ago

TobiasNx commented 1 month ago

See: https://github.com/metafacture/metafacture-fix/pull/376

https://github.com/metafacture/metafacture-fix/tree/d18a5035d2b74508d685ec863c91e4382aff2c26/metafix/src/test/resources/org/metafacture/metafix/integration/method/fromXml/toJson/replace_allInOptionalSubfieldOfRepeatedObjectsWithAsterisk

in:

<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
    <record>
        <datafield tag="650" ind1=" " ind2=" ">
            <subfield code="a">Nonprofit organizations</subfield>
            <subfield code="x">Management.</subfield>
        </datafield>
        <datafield tag="650" ind1=" " ind2=" ">
            <subfield code="a">Nonprofit organizations</subfield>
            <subfield code="x">Management</subfield>
            <subfield code="v">Case studies.</subfield>
        </datafield>
    </record>
</collection>
do list(path: "650??", "var": "$i")
  copy_field("$i.a", "RSWK[].$append.subjectTopicName")
  copy_field("$i.v", "RSWK[].$last.subjectGenre")
end
replace_all("RSWK[].*.subjectGenre", "[.]$", "")
retain("RSWK[]")

Out:

{
  "RSWK" : [ {
    "subjectTopicName" : "Nonprofit organizations",
    "subjectGenre" : "Case studies"
  }, {
    "subjectTopicName" : "Nonprofit organizations",
    "subjectGenre" : "Case studies."
  } ]
}

Expected:

{
  "RSWK" : [ {
    "subjectTopicName" : "Nonprofit organizations"
  }, {
    "subjectTopicName" : "Nonprofit organizations",
    "subjectGenre" : "Case studies"
  } ]
}
TobiasNx commented 1 month ago

When using set_array this seems to work. Not sure why:

set_array("RSWK[]")
do list(path: "650??", "var": "$i")
  copy_field("$i.a", "RSWK[].$append.subjectTopicName")
  copy_field("$i.v", "RSWK[].$last.subjectGenre")
end
replace_all("RSWK[].*.subjectGenre", "[.]$", "")
retain("RSWK[]")

Unfortunately the playground does not show the new behavior yet.

blackwinter commented 1 month ago

It seems to be caused by copy_field carrying over the old value's path (650 .v) to the new location (RSWK[].$last.subjectGenre). That way, replace_all gets confused and assumes that the target position for the transformed value must be the first array element.

It can be solved by explicitly telling copy_field (and other call sites in the same vein) to ignore the old value's path (.withPathSet(null); see #380). But I'm unsure whether this is the right approach; at the very least, it modifies the original value's path which might cause subsequent issues down the line.

I'm sorry to say this, but our path handling is quite a mess ;)