larsyencken / csvdiff

Generate a diff between two tabular datasets expressed in CSV files.
BSD 3-Clause "New" or "Revised" License
132 stars 31 forks source link

Create diff.csv from diff.json #48

Open wgoh opened 5 years ago

wgoh commented 5 years ago

Thanks for this library! However I think there is currently no way to just output the diff as csv? Something like csvpatch --input=diff.json --output=diff.csv. Let me know if there is a way! :)

larsyencken commented 5 years ago

Ah, sorry, it's designed to make a patch format that you can reuse in other programs. There's no built-in way to output to CSV yet.

davidmreed commented 5 years ago

I'm also interested in CSV-format diffs - for example, getting a CSV output that includes only added and changed records, with an option to populate only changed fields or all fields for changed records. Is that what you're after too, @wgoh ? @larsyencken , would you entertain a pull request if I find some spare time?

simbo1905 commented 5 years ago

We are interested in looking at breaks in very large csv reports as part of both unit testing and regression testing. Showing the changed lines rather than a json report would better summarise the differences to none technical users so that they don't file false positive bugs when the differences are correct due to enhancements.

I am thinking to script that pipes the output of csvdiff to jq to select the keys that are changed then pipe that to something that can extract the matching lines. It would be nice if cvsdiff could make this easier to do.

simbo1905 commented 5 years ago

I have written a python script to parse the output of csvdiff to generate csvsql queries to extract the added, removed and modified lines. You can then run those queries to get the cvs versions of the difference. It is on github at https://github.com/simbo1905/csvdiff2csvsql/blob/master/README.md