Closed Travellerme closed 7 months ago
Hi @Travellerme, thanks for opening this. I can definitely see the use case for wanting to have keys with nested dots without having them appear escaped in the output CSV header. The reason the dots are currently being escaped is because it was the only way that I could ensure that the correct values were being retrieved for edge cases like a document like this:
{
"a.b": { "c": 0 },
"a": { "b": { "c": 2 } }
}
While it seems unlikely to happen, it's a valid possibility and without escaping the nested dots, there was no way to guarantee the correct key path's value was retrieved. Specifically, would a.b.c
reference 0
or 2
? Whereas escaping the nested dot, a\\.b.c
provides some assurance that we're looking for 0
. Some libraries make assumptions that don't always pan out, but I relied on escaping the nested dot to be certain that the CSV would be correct. I agree that it doesn't look the prettiest in the output though.
I'm fairly certain that the functionality you're looking for is still entirely possible, but it is a bit more involved to implement since it will require reworking the deeks
module a bit to enable tracking the separate components of each key path that's identified in order to keep the same level of certainty. Additionally, the doc-path
module will need an update to add support for a string[]
to be specified with the key path components (eg. the key strings at each layer) instead of the string
that it currently expects for the key path.
I'm open to working to support this, but it's likely going to be a longer timeframe than some of the other open issues that have a much more concise scope, and my time to support this is limited to after my full-time work. If you, or someone else that would like this sooner, is up for it, then I'd also be happy to review and merge pull requests for any of the updates needed to add support for this.
Thanks for your understanding. š
Hello @mrodrig Thanks for your clarification! I understood the edge case you mentioned and it's more than valid.
I have a couple of quick ways to resolve it and would like to know your opinion about them:
escapeNestedDots
but in case we have several equal headers after it - throw an error. I think it will not be a breaking change, as the error could be thrown only if someone passes the new option for overriding escapeNestedDots
. The documentation can describe the error so that developers will be aware of it.escapeNestedDots = true
as it's done right now, so nothing will be changed internally for the provided example, but you can add one more option such as escapeHeaders = boolean
, and if this option is provided, before the final generation of csv file you can manually escape /
characters in every header something like const header = params.header.replaceAll('\\', '')
Also, the disclaimer about possible header name duplication can be added to the documentation for a new option, so developers should accept this risk in case they use it.
Please let me know what you think about it. Thank you
Hi @Travellerme, Thanks for your thoughtful response and additional options I hadn't considered. You've raised some great points with these additional paths forward.
For option 1 you mentioned, I think that is reasonable with the possibility that an error could be thrown for data where multiple keys have equivalent paths when not escaped. The only possible issue that I see with this, would be that the doc-path
module depends on any keys with nested dots to be escaped in order for it to navigate through the object and retrieve the associated value. For example:
const { evaluatePath } = require('doc-path')
const data = {
"a.b": {
"c": "2"
}
};
console.log( evaluatePath(data, 'a.b.c') ); // => prints: null
console.log( evaluatePath(data, 'a\\.b.c') ); // => prints: "2"
I think it's possible to update doc-path
to support that, but I'd want to be cautious about how it's implemented so there isn't a performance degradation that would impact dependent packages or apps.
Based on that, I've been giving more thought to option 2 that you outlined with an option. It seems this would be the quickest and probably the simplest way to achieve this with minimal risk. Since it would effectively be updating more of the CSV "presentation" post-processing of the provided JSON, this seems like it would be ideal given that there's less risk of incorrect data being retrieved. One possible downside that I see is that any keys that have \\
present elsewhere in them would be updated to have those instances removed. It seems like a reasonable mitigation to decrease that likelihood would be to do something like .replaceAll(/\\\./g, '.')
so only \\.
is replaced with .
inside the keys. This potential side effect could also be documented alongside the option flag to make the possibility clearer.
I think based on that I'm leaning towards option 2, but am open to option 1 if you think that's a better implementation of this for your use case. Thanks again!
Hi @mrodrig I agree that option 2 seems to be the most painless and easy to implement. I would appreciate it if you managed to implement this feature soon. Thank you!
Thanks @Travellerme for your collaboration on this! I was able to make the simple changes necessary to support this tonight and published version 5.2.0
with a new escapeHeaderNestedDots
option. When set to false
, it will replace the escaped dots in header keys with unescaped dots so the CSV is more readable.
Prior to this option, console.log
ging the generated CSV from your example would result in:
a\.a,a\.b.c,a\.b.c\.d
1,2,3
However, with version 5.2.0
with the new escapeHeaderNestedDots
option set to false
, the CSV output when console.log
ging it will look like this:
a.a,a.b.c,a.b.c.d
1,2,3
For others for future reference, here's an updated code snippet with the option specified:
import { json2csv } from 'json-2-csv'
const list = {
"a.a": "1",
"a.b": {
"c": "2",
"c.d": "3"
}
};
const csv = json2csv(list, {
escapeHeaderNestedDots: false,
unwindArrays: false,
emptyFieldValue: '',
checkSchemaDifferences: false,
expandNestedObjects: true,
expandArrayObjects: true
})
console.log(csv);
Hopefully this helps! š
Thanks @mrodrig a lot! It really helped me!
Hi @mrodrig
Sorry for resurrecting it, but I've just updated the version to the latest 5.4.0
and it seems the problem still exists.
[
{
"tempFile1": "blabla/tempFile1",
"a": 1,
"b": {
"b1": [
"b11",
"b12"
]
},
"c": {
"c1": {
"c2.c4": "c_value"
}
}
},
{
"tempFile2": "blabla/tempFile2",
"a": 1,
"b": {
"b1": [
"b11",
"b12"
]
},
"c": {
"c1": {
"c2.c4": "c_value"
}
}
},
{
"tempFile3": "blabla/tempFile3",
"a": 1,
"b": {
"b1": [
"b11",
"b12"
]
},
"c": {
"c1": {
"c2.c4": "c_value"
}
}
}
]
tempFile1,a,b.b1,c.c1.c2\.c4,tempFile2,tempFile3
blabla/tempFile1,1,"[""b11"",""b12""]",c_value,,
,1,"[""b11"",""b12""]",c_value,blabla/tempFile2,
,1,"[""b11"",""b12""]",c_value,,blabla/tempFile3
As you may see, the c.c1.c2\.c4
was not escaped.
thx
Hi @Travellerme, can you please confirm that you're specifying escapeHeaderNestedDots: false
in the options you're passing to the json2csv
method? The default value is true
, so setting it to false
will trigger the logic to remove any escaped dots in the header keys. I tried the data sample you provided with that option set to false
and I'm not seeing the escaped dot in the CSV output with the same example code from https://github.com/mrodrig/json-2-csv/issues/245#issuecomment-1960681667:
tempFile1,a,b.b1,c.c1.c2.c4,tempFile2,tempFile3
blabla/tempFile1,1,"[""b11"",""b12""]",c_value,,
,1,"[""b11"",""b12""]",c_value,blabla/tempFile2,
,1,"[""b11"",""b12""]",c_value,,blabla/tempFile3
Hi @mrodrig , sorry I made a mistake. For some reason, I thought escapeHeaderNestedDots
must be equal to true
to avoid escaping. My bad, everything works just fine, thx!
Just to notify, for some users, it's a breaking changes. Lot of my scripts were adapted to dotted headers (not my choice). So when update to latest, Pow! nothing works anymore. I think v6.0 would have been more indicated. My 2cts.
Fortunately, my CI has detected it before going to prod. But i think it was a breaking changes.
Hi @MathRobin, I'm happy to take a look if you have a sample that replicates your use case breaking from 5.0.1
to 5.4.0
, but the behavior for this is behind the escapeHeaderNestedDots
option, which has a default value set to preserve the existing behavior. All of my test cases that had nested dots in their header keys continued to pass with this change, and new test cases defined for this new option also worked as expected, so there's no indication from my testing that this had any breaking changes.
Background Information
5.0.1
20.11.0
The issue I'm reporting is with:
I have...
Expected Behavior
Please provide a possibility to define this option during csv generation.
escapeNestedDots = false results in: ['a.a', 'a.b.c', 'a.b.c.d']
Actual Behavior
The
escapeNestedDots
option is hardcoded for the https://www.npmjs.com/package/deeks library.As a result, it's not possible to generate csv without a preceding
\
character in case one of json key has a.
character in a key name.escapeNestedDots = true results in: ['a\\.a', 'a\\.b.c', 'a\\.b.c\\.d']
Data Sample
CSV:
Code Example