paulfitz / daff

align and compare tables
https://paulfitz.github.io/daff
MIT License
788 stars 66 forks source link

Daff to output row-level changes without before->after in-cell differences #199

Open dwrapson-arc opened 3 months ago

dwrapson-arc commented 3 months ago

tldr: can I get only line-level changes highlighted in @@ column of daff output without in-cell before->after.


Daff is (almost) exactly the tool I've been searching for and was rejoiced when I found it. I need to diff two full-export datasets day by day and generate a delta (for CDC) to feed into an import framework.

However, whilst I do need comparisons made cell-by-cell (row-by-row respecting keys), I don't need to actually know the specifics of those changes. I just need to know which rows are add, modify, delete, and the framework will take care of the rest (SCD2).

In fact, having before->after within-cell actually makes it much harder to work with as I would have to parse that out which I don't want to need to do. It is possible to simulate this output with some code, taking the output then merging it with the newer file in dataframes, but would be amazing if daff could do it all directly.

I've gone through the options and maybe there's something I'm missing, but can I get purely a row-level output with only the modified values in the patch file output?

So if I daff 1.csv 2.csv I would get +++ for adds, --- for deletes, and just -> in the @@ column for modify and nothing else on the row.

I should note I've been mostly looking at the CLI interface, and not yet interfacing in Python or other language. If there are more options available there again I haven't been able to track them down from the specification.

dwrapson-arc commented 3 months ago

Example for clarity:

$ daff 1.csv 2.csv --context 0

Regular Output

@@ bridge designer length
+++ Manhattan G. Lindenthal 1470
-> Williamsburg D. Duck->L. L. Buck 1600
--- Spamspan S. Spamington 10000

Desired Output

@@ bridge designer length
+++ Manhattan G. Lindenthal 1470
-> Williamsburg L. L. Buck 1600
--- Spamspan S. Spamington 10000