zemirco / json2csv

Convert json to csv with column titles
http://zemirco.github.io/json2csv
MIT License
2.72k stars 362 forks source link

Empty column when unwinding nested array #494

Closed martinfrancois closed 4 years ago

martinfrancois commented 4 years ago

Version of json2csv: 5.0.3 Node version: 12.19.0

We encountered this bug during development, to make it reproducible I created a codesandbox: https://codesandbox.io/s/json2csv-bug-reproduction-fl5f9?file=/src/index.js

To better differentiate the bug, I added an example that works as expected and one example where the bug occurs, with minimal differences between them to show exactly what results in the bug.

The only difference between inputExpected and inputUnexpected is that in inputUnexpected, the items array is empty in the second input object.

Actual behavior: When parsing inputUnexpected, it results in a column being added to the header: ,"items" along with empty cells for each of the rows, resulting in the following columns: "id","inputName","items.itemId","items.itemName","items"

Expected behavior: When parsing inputUnexpected, only the following columns are present: "id","inputName","items.itemId","items.itemName" (as items is empty)

juanjoDiaz commented 4 years ago

Hi @martinfrancois ,

Thanks for the code example. I wish that all our users create issues like this.

Unfortunately, this is not really a bug, but an edge case.

Unwinding is done row by row. So your example input:

[
  {
    id: 1,
    inputName: "Test Input 1",
    items: [
      {
        itemId: 10,
        itemName: "Test Item 10"
      },
      {
        itemId: 11,
        itemName: "Test Item 11"
      }
    ]
  },
  {
    id: 2,
    inputName: "Test Input 2",
    items: []
  }
];

is transformed by unwind and flatten into:

[
  {
    id: 1,
    inputName: "Test Input 1",
    "items.itemId": 10,
    "items.itemName": "Test Item 10"
  },
  {
    id: 1,
    inputName: "Test Input 1",
    "items.itemId": 11,
    "items.itemName": "Test Item 11"
  },
  {
    id: 2,
    inputName: "Test Input 2",
    items: []
  }
];

The sync parser loop first through all the items in the array to gather all the existing fields. Since the last item still contains the items field is produces the undesired output. If you use the async API it takes the fields only from the first item. In any case, the issue will still happen if the first element of the array has the empty items list.

There is no way around this really since the field that you unwind (in this case, items) could contain objects with different fields and you will still want all of them to be present in the resulting CSV.

For example, for an input:

[
  {
    id: 1,
    inputName: "Test Input 1",
    items: [
      {
        itemId: 10,
        itemName: "Test Item 10"
      },
      {
        itemId: 11,
        otherItemField: "Test Item 11"
      }
    ]
  }
]

the output should be:

"id","inputName","items.itemId","items.itemName","items.otherItemField"
1,"Test Input 1",10,"Test Item 10",
1,"Test Input 1",11,,"Test Item 11"

The only possible solution at the moment is to manually pass the fields option:

const parserOpts = {
  fields: ["id", "inputName", "items.itemId", "items.itemName"],
  transforms: [unwind({ paths: ["items"] }), flatten()]
};

And I was going to send this I thought better. This is indeed a bug and the solution is trivial. 😅 Will send a PR asap.

As a side note, your comment:

// same as `parser` but new instance for clean state
// NOTE: When re-using `parser` here, the bug isn't reproducible,
// since probably when parsing the first time, it will "save" the fields

seems to be indeed a bug. I thought that we fixed this ages ago but it seems that we didn't. I'll ticket a bug about it.

martinfrancois commented 4 years ago

Hi @juanjoDiaz,

You're welcome! Being a maintainer myself, I have more than enough experience with bad issue descriptions, so I'm trying to always submit issues in the same way that I would like to receive myself 😃 . Thanks for the quick response, it's much appreciated.

Thanks a lot for taking care of it in a PR, and I'm also glad I was able to subconsciously discover another bug, which I thought was expected behavior 😄 .