blk_unpack: add new json output variants

klensy commented 4 years ago

Hi. I want to change/add new output for blk_unpack's json.

Now, program (when using --format=json) writes content of groups (foo{}) from blk as json objects if they unique in current group, otherwise as arrays.

It's bad:

Sometime devs add duplicated values in blk and then big portions of file can be changed and this hard to compare line to line.
Hard to parse: you should always check if current item object or list.

I suggest few things:

Write group members always as arrays, so adding duplicated values change only few lines and didn't convert object to array.
Sometime values in blk didn't change, but only exchange line numbers: so we can add optional sorting of values to prevent this effect.

@gszabi99 @VitaliiAndreev

VitaliiAndreev commented 4 years ago

Hi.

I have noticed that the rule of unique JSON properties is bypassed by converting the whole object into an array of single-property objects.

In my project I've resorted to reading those chunk arrays, detecting duplicated properties, and reassembling JSON objects from those chunks, collecting values only of duplicated properties into value arrays instead.

Example:

[
   {
      "A": 1
   }
   {
      "B": 17
   }
   {
      "B": 42
   }
],

Becomes:

{
   "A": 1,
   "B": [17, 42]
}

It's not being done to JSON text directly but is rather abstracted by the framework. These examples are just equivalent representations.

JSON objects are easy to deserialise once standardised, so it's best if the overall structure remains intact. All properties becoming arrays is problematic because duplicates are rather an exception than a norm.

As for sorting, yes, that would reduce the amount of noise. The only possible downside is that some properties are being placed close to each other by relation. Sorting might make it more difficult to look up relevant blocks by the naked eye. Some experiments may be required to see how it turns out.

klensy commented 4 years ago

collecting values only of duplicated properties into value arrays instead.

I thought about it long ago, but i didn't found simple way to distinct this from blk array values like foo:ip2= 1,2 (that in json "foo": [1,2]) without adding types.

For example: adding small skin object broken this file completely: https://github.com/gszabi99/War-Thunder-Datamine/commit/082a3cc73081852acd6e3d119991c63744d564d1#diff-147ad74683510a589adc6ed740ee4ab1

VitaliiAndreev commented 4 years ago

i didn't found simple way to distinct this from blk array values like foo:ip2= 1,2 (that in json "foo": [1,2]) without adding types.

I don't have much experience with python, so can't really help there.

In my case I have to loop twice over each collection - once to gather all duplicate names, and once more for processing. Because all objects have to have the same schema to be deserialised, even values in non-duplicated cases of elsewhere duplicated properties have to converted into arrays - that's what really necessitates the first loop.

adding small skin object broken this file completely

Yeah, that's the effect I'm talking about. It would end up like:

skin:
[
   {
      ...
      "replace_tex":
      [
         { ... },
         { ... },
      ]
   },
   {
      ...
      "replace_tex":
      [
         { ... },
         { ... },
      ]
   },
]

Though admittedly I haven't gotten to the point of deserialising from DM and FM files, so some adjustments may be required on my part to make it as described, if not up to scratch.

klensy commented 4 years ago

I don't have much experience with python, so can't really help there.

I'm not talking about python, but about json representation of data. If you convert duplicates to arrays, you need somehow distinct this duplicates.

What is "fresnel": [0.23,0.1,2.0] ? Is it duplicated values converted to array

fresnel:r=0.23
fresnel:r=0.1
fresnel:r=2.0

or it was array from the start? "fresnel":p3=0.23, 0.1, 2.0

VitaliiAndreev commented 4 years ago

Does the original state matter? I struggle to see the consequences.

Also what about creating a property like "fresnel_is_duplicated": true to keep track of such cases?

klensy commented 4 years ago

Want to keep converted data (and code) as simple as possible, additional flags only complicate things. Anyway, will try few variants, maybe good one will be found.

klensy commented 4 years ago

Added --sort option for blk_unpack: will sort keys for json output. Give good results, but only if both files with sorted keys, otherwise very bad, obviously.

klensy commented 4 years ago

Trying the idea of always placing values in arrays:

we always have unique keys
duplicates can be detected if more than one value in array
not pretty to view manually, as arrays formatted in 3 lines instead of one. Can be solved with additional tuning jsonEncoder or external tool.

example: https://gist.github.com/klensy/f282f3a31a15d2a87dce3fd6b558a304

klensy / wt-tools

blk_unpack: add new json output variants #53