larsyencken / csvdiff

Generate a diff between two tabular datasets expressed in CSV files.
BSD 3-Clause "New" or "Revised" License
132 stars 31 forks source link

Create on-the-fly index column if you don't explicitly specify one #36

Open friederschueler opened 6 years ago

friederschueler commented 6 years ago

I wonder if you could point me to a starting point, how to implement this and if there are any caveats to think of.

Problem: I generate csv files from a database view and they don't have a unique identifier which could be used as index. Idea: use the line number of the current row - 1 as index. (like adding a virtual colum in the csv)

With the current implementation this use-case will fail silently, as no changes are reported: from csvdiff import * diff_files("e.txt", "f.txt", [], ";") I would like to implement this functionality and provide a pull request for this feature if you think that is a good idea.

e.txt f.txt I had to rename the files to .txt as github doesn't support .csv

karakutu001 commented 6 years ago

I have the same problem. Did you resolve this problem?

I would like to compare two different csv. but unfortunately, csvdiff can't find all the line which are changed.

friederschueler commented 6 years ago

@karakutu001 No, I needed a quick fix and it I just prefixed my data files with a "rowID" column and used that. But you are more than welcome to provide some code. I am busy right now, but I could help with it in the next weeks.