Thanks for your fantastically useful json2csv script - I've been using it to parse data from OpenLibrary dumps. It's working very well, even though the OL data is very inconsistently structured. One question, though, if I may...
In a case where there are commas within an item, eg
json2csv appears to strip out the commas within the value, so the four different subjects all get merged into one. It comes out like this for -k subjects:
[Books and reading -- Fiction. Storytelling -- Fiction. Death -- Fiction. Jews -- Germany -- History -- 1933-1945 -- Fiction.]
Is there a straightforward way to get it to preserve those multiple items within a value? (I don't need them as separate fields in the CSV, but would like to preserve the distinction within the 'subjects' field, if you see what I mean - so they could be delimited by something other than a comma.)
(I tried using the -d flag to set a different field delimiter, e.g. semicolon, but it still stripped out the commas as above.)
Edit: another example...
"subject_places": ["United States", "China"]
comes out as
[United States China]
so it's not really practical to find some automated way of parsing that alas.
Thanks for your fantastically useful json2csv script - I've been using it to parse data from OpenLibrary dumps. It's working very well, even though the OL data is very inconsistently structured. One question, though, if I may...
In a case where there are commas within an item, eg
{"subjects": ["Books and reading -- Fiction.", "Storytelling -- Fiction.", "Death -- Fiction.", "Jews -- Germany -- History -- 1933-1945 -- Fiction."]}
json2csv appears to strip out the commas within the value, so the four different subjects all get merged into one. It comes out like this for -k subjects:
[Books and reading -- Fiction. Storytelling -- Fiction. Death -- Fiction. Jews -- Germany -- History -- 1933-1945 -- Fiction.]
Is there a straightforward way to get it to preserve those multiple items within a value? (I don't need them as separate fields in the CSV, but would like to preserve the distinction within the 'subjects' field, if you see what I mean - so they could be delimited by something other than a comma.)
(I tried using the -d flag to set a different field delimiter, e.g. semicolon, but it still stripped out the commas as above.)
Edit: another example... "subject_places": ["United States", "China"] comes out as [United States China] so it's not really practical to find some automated way of parsing that alas.