Closed ariedeleen closed 8 years ago
The current release doesn't allow different delimiters, but I just put together a quick patch to allow it with --sep
, for example csvdiff --sep=';' id a.csv b.csv
. You'd have to check out master to try it out though.
Let me know if you need any help with it.
Wow thanx Lars for the speedy implementation ;-) i'll look into it and let you know, Arie
Quick question: can I do a pip install git+https://github.com/larsyencken/tests/csvdiff.git to 'pull' the master? So --sep will be optional.
You can install it with:
pip install -e git+https://github.com/larsyencken/csvdiff.git#egg=csvdiff
Then you can check you have the right version:
$ csvdiff --help
Usage: csvdiff [OPTIONS] INDEX_COLUMNS FROM_CSV TO_CSV
Compare two csv files to see what rows differ between them. The files are
each expected to have a header row, and for each row to be uniquely
identified by one or more indexing columns.
Options:
--style [compact|pretty|summary]
Instead of the default compact output,
pretty-print or give a summary instead
-o, --output PATH Output to a file instead of stdout
-q, --quiet Don't output anything, just use exit codes
--sep TEXT Separator to use between fields [default:
comma]
--help Show this message and exit.
Notice the --sep
option is now in amongst the documented options.
Tried csvdiff --sep=';' 'Manufacturer partnumber' ykoonold.csv ykoonnew.csv -o difference.csv
But got this typical python error: TypeError: "delimiter" must be string, not unicode
Console output:
_(pyenv)[arie@dev temp]$ csvdiff --sep=';' 'Manufacturer partnumber' ykoonold.csv ykoonnew.csv -o difference.csv
Traceback (most recent call last):
File "/home/arie/pyenv/bin/csvdiff", line 9, in
Ah, I some tests were failing for python2.7 which tox didn't bring up. I found and patched the problem.
Want to try again?
Thanx I will look in to it asap. Now busy with another project ;-) And in the Netherlands it is spring break vacation time. And to day celebration of our kings birthday. Everything is orange dressed up. Funny Dutch man.
Question: diff large csv files? Let's say 750.000 lines and 12 columns it that possible. Or could it take day's for a result.
Haha, sounds nice :) In Stockholm snowed a few days in a row, then rained a few days. Not yet happy spring weather.
With large CSV files, it just has to fit into memory. If you were diff'ing two files of the size you mentioned, and there were no changes, it might take 30s. If there are lots of changes, maybe a few minutes? It should still work.
Works like a charm ;) it pretty fast csvdiff-ed the two large files mentioned b4 ~ 52s. And is it also possible to get csv format back with an extra column at the end. With removed, added, changed. r a c for short.
Glad it works! Unfortunately, I can't look at the extra column idea right now. But, if you have a programming background, you could try using the csvdiff
API and generating it yourself. Otherwise, you might have to rely on the statistics from --style=summary
, or just reading the JSON output. Best of luck!
Is it possible to change the default separator komma to for example semicolumn.