larsyencken / csvdiff

Generate a diff between two tabular datasets expressed in CSV files.
BSD 3-Clause "New" or "Revised" License
132 stars 31 forks source link

Documentation for non-command line usage #25

Closed gerpsh closed 5 years ago

gerpsh commented 7 years ago

Currently the documentation only references command line usage. Would it be possible to get some documentation/examples on how to use csvdiff within Python code?

larsyencken commented 7 years ago

Hi, sorry we don't have much in the way of Python docs.

The core of the module works on lists of dictionaries, which we call records. You should do the import like:

from csvdiff import patch

diff = patch.create(from_records, to_records, index_columns)

See also patch.apply() and patch.save() in the code. Things are at least documented there.

karakutu001 commented 6 years ago

how should I imagine "index_columns"? what kind of value is index_colums? why except index_colums an array?

I try to use the module with the following line

diff = csvdiff.diff_files('output2.txt', 'input2.txt',[] )

the script cannot find all the line where was a change. And I can not find what is wrong?

Thanks

larsyencken commented 5 years ago

The index columns are the columns that form the primary key for this dataset. It could be one column like ['id'], or multiple, like ['firstname', 'lastname']. It's going to depend on your data, on what uniquely identifies a row.

We use this to determine when two rows represent the same thing, so that we can detect changes.

larsyencken commented 5 years ago

I should add that the README has basic usage now, and using it without providing index columns raises an error on the current master.