metafacture / metafacture-fix

Work in progress towards an implementation of the Fix language for Metafacture
Apache License 2.0
6 stars 2 forks source link

`copy_field` array of strings with a single value create internally no arrays #239

Open TobiasNx opened 2 years ago

TobiasNx commented 2 years ago

When copying an array of strings that has a single value, fix seems not to copy it as an array but an simple string-element.

This is not spotted since fix outputs simple stringelements that have [] at the end of their name as arrays even if they are not handled as arrays fix-internally. I try to come up with an integration test for this.

Here is an example for it.

This illustrates it even better:

It is disguised by the handling of the output by the json decoder.

This problem could be connected to: https://github.com/metafacture/metafacture-fix/issues/116 and https://github.com/metafacture/metafacture-fix/issues/195

It seems that array are not copied as arrays at all but each value individually. Therefore #195 empty arrays cannot be copied and arrays of strings with single values are not copied as simple string elements. With multiple values the behaviour therefore seems to be identical to the non-destructive behaviour of copy_field/move_field #116 .

blackwinter commented 2 years ago

copy_field() treats the value as a list (via Value.asList()) because retrieving a path might produce an array where there was none in the data. So it's impossible to preserve the original value's type in the general case.

E.g., with the following internal data:

{ "your": [ { "name": "max" }, { "name": "mo" } ] }

Retrieving your.name would yield an array even though the actual array was your (similar for wildcards, where there might not be any array in the data):

[ "max", "mo" ]

As for your original lookup() issue, there is a workaround of course. I'm just not sure if there's a universal fix to be found here.

TobiasNx commented 2 years ago

This is also connected then: https://github.com/metafacture/metafacture-fix/issues/148 and https://github.com/metafacture/metafacture-fix/issues/92. Am I right?

TobiasNx commented 2 years ago

Retrieving your.name would yield an array even though the actual array was your (similar for wildcards, where there might not be any array in the data):

[ "max", "mo" ]

I would say this is not the same case. But I do not know if this makes a difference.

While you are interested here in a single element of an array of objects: copy_field("your.*.name")

In my case you are interested in the array as a whole wether it be with a single element, multiple elements or none. copy_field("your[]") or may it be nested copy_field("i.am.nested.your[]")

Also how is catmandu behaving?

TobiasNx commented 2 years ago

Also how is catmandu behaving?

https://github.com/TobiasNx/Catmandu-Testing/tree/master/ArrayHandling

It seems that Catmandu is able to work with single and no value array of strings.

TobiasNx commented 2 years ago

copy_field() treats the value as a list (via Value.asList()) because retrieving a path might produce an array where there was none in the data. So it's impossible to preserve the original value's type in the general case. by @blackwinter

One idea I have conceptually is, that you introduce a conditional control if the original value type is an array. If so, copy_field would create an empty array (like set_array) before copying every single value. This would enable the possibility of copying empty arrays as well as arrays with single values.

fsteeg commented 2 years ago

When copying an array of strings that has a single value, fix seems not to copy it as an array but an simple string-element. [...] fix outputs simple stringelements that have [] at the end of their name as arrays [...]

While I see the conceptual point here, I don't quite see the practical problem. Why not add the []?

And while it might be possible to preserve the original type (based on the path), I don't think it's worth the effort here. We already deviate from Catmandu since we need to specify the array marker in the fix (as in #240), and there is an easy solution, adding the array marker. Internally, the values are always in an array, and the array marker is used precisely to control if we want to output as an array or not.

The empty array is a different topic, and we should fix that, but we have a separate issue for that (#195).

TobiasNx commented 1 year ago

copy_field() treats the value as a list (via Value.asList()) because retrieving a path might produce an array where there was none in the data. So it's impossible to preserve the original value's type in the general case. by @blackwinter

One idea I have conceptually is, that you introduce a conditional control if the original value type is an array. If so, copy_field would create an empty array (like set_array) before copying every single value. This would enable the possibility of copying empty arrays as well as arrays with single values.

Since I used the already established is_array conditional I think this really could be a simple solution for the problem to test if the copied field is_array and if so, create an "new" array first before copying values:

https://metafacture.org/playground/?flux=PG_DATA%0A%7Cas-lines%0A%7Cdecode-formeta%0A%7Cfix%0A%7Cencode-xml%28rootTag%3D%22collection%22%29%0A%7Cprint%0A%3B&fix=set_array%28%22test%22%2C%22grey%22%29%0Aif+is_array%28%22test%22%29%0A++++add_field%28%22test.%24append%22%2C%22black%22%29%0Aelsif++is_string%28%22test%22%29%0A++++add_field%28%22test.%24append%22%2C%22white%22%29%0Aend%0A&data=1%7Ba%3A+Faust%2C+b+%7Bn%3A+Goethe%2C+v%3A+JW%7D%2C+c%3A+Weimar%7D%0A2%7Ba%3A+R%C3%A4uber%2C+b+%7Bn%3A+Schiller%2C+v%3A+F%7D%2C+c%3A+Weimar%7D&active-editor=fix

TobiasNx commented 11 months ago

This again popped up. And I am still not sure why this is difficult:

https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+as-records%0A%7C+decode-json%0A%7C+fix%28transformationFile%29%0A%7C+encode-json%28prettyPrinting%3D%22true%22%29%0A%7C+print%3B&transformation=set_array%28%22indexCountHelper%5B%5D%22%29%0Ado+list%28path%3A%22GST%5B%5D%22%2C%22var%22%3A%22%24i%22%29%0A++++copy_field%28%22%24i.a%22%2C%22indexCountHelper%5B%5D.%24append%22%29%0A++++copy_field%28%22indexCountHelper%5B%5D%22%2C%22%24i.indexPosition%22%29%0A++++count%28%22%24i.indexPosition%22%29%0Aend&data=%7B%0A++%22GST%22+%3A+%5B%0A++++%7B%0A++++++%22a%22+%3A+%22TestA1%22%0A++++%7D%2C+%7B%0A++++++%22a%22+%3A+%22TestB1%22%2C%0A++++++%22b%22+%3A+%22TestB2%22%0A++++%7D%2C+%7B%0A++++++%22a%22+%3A+%22TestC1%22%2C%0A++++++%22b%22+%3A+%22TestC2%22%2C%0A++++++%22c%22+%3A+%22TestC3%22%0A++++%7D%0A++%5D%0A%7D%0A

A copied array/list of strings with a single element is not copied as such, but copied as a simple string element.

TobiasNx commented 11 months ago

@blackwinter you already introduced a test if the elememt is an array here with https://github.com/metafacture/metafacture-fix/commit/2427f2c18c455fe638dec1ff7fb9385cd042f029:

https://github.com/metafacture/metafacture-fix/blob/68104849904221b6d8fce38aa580945fe39cdd8b/metafix/src/main/java/org/metafacture/metafix/FixMethod.java#L170-L177

would it be possible to adjust the behaviour with single value arrays here too?

TobiasNx commented 4 months ago

Not sure why I am assigned here. I remember that @blackwinter suggested a solution that is not "universal" for every scenario in our meeting. @blackwinter do you remember?