Open mfripp opened 3 months ago
Thanks for the well thought-out feature request @mfripp !
Copying in @janriemer - csv-diff's maintainer...
Thank you, @mfripp, for the detailed description and thoughts on this feature (and @jqnatividad for making me aware of it)!
I really like the possible solution you've described and I feel like this should have highest priority regarding next features of diff
command.
@jqnatividad Can you please assign this issue to me. Thank you.
The possibility of getting the fields that are different is actually already in the implementation of diff
- it is just not used yet (waiting on a feature request like yours 😉):
https://github.com/jqnatividad/qsv/blob/08cfda6383e6ff70e683df5a77b6b2ef6530c4d9/src/cmd/diff.rs#L245-L251
So it shouldn't be too difficult to implement your idea (famous last words?). 🙂
Unfortunately, I'm a bit busy lately, so didn't have the time currently.😢
However, mid/end August should be more time, so I can start implementing a prototype then. 🤞
patch
: there is already an open issue in csv-diff
itself (the lib that powers diff
) describing the need to have git-diff-style format - is this roughly what you have in mind?csvdiff
and their different output formatsHey @jqnatividad @mfripp :wave:
here is the current status of the feature requests in this issue
csv-diff
itself, because it is too costly (performance-wise) to implement it directly in diff
commandFor the other feature requests it is probably best to create separate issues for them, so that we don't lose the overview.
Thanks, this is great to see!
Just merged #2114 ... just in time for qsv 0.134.0! Thanks @janriemer !
Is your feature request related to a problem? Please describe. In csv files with many columns, it can be difficult and unreliable to find the particular fields that differ between dropped and added rows. This requires carefully scanning across the output, using a grid-oriented csv viewer.
Describe the solution you'd like One possible solution would be to add a
--drop-identical-fields
flag (or something similar), which will cause identical fields between a "-" and "+" row to be replaced with either empty values or a flag like "(same)". Then, before outputting the results, any columns that don't have any changes (i.e., the column is entirely full of empty fields or "(same)" markers) will be dropped. So the output file will only contain the key columns and any data columns that actually have differences, and even in those, it will only show values when there are differences. This will make it easy to see exactly what data is different between the two files.Describe alternatives you've considered One alternative is to open the result in a spreadsheet and add flags to indicate where differences occur, but this is cumbersome. Currently I just scan visually across pairs of rows, but this is also cumbersome and error prone.
Another option might be to output a sort of "patch" format, with one row per different field. This could be a table where the first n fields are the index values, the next field is called "column" and gets the name of the field that differed, the next field is called "left_value" and has the value of this field from the left file, and the final field is called "right_value" and has the value from the right file. That might be clearer (no risk of conflict with existing empty fields or fields that already say "(same)"), but I'm not sure it's better.
Another option that might be better would be to use color to highlight the columns that are actually different, at least when output is sent to a TTY. This would be similar to the display in the GNU version of the
diff
command, VS Code's diff view, Apple's FileMerge viewer orvim -d file1 file2
.Additional context (none)