okfn / specs

Lightweight Standards and Patterns for (Open) Data
http://specs.okfnlabs.org/
5 stars 0 forks source link

Row numbers in the highlighter diff? #2

Open danfowler opened 8 years ago

danfowler commented 8 years ago

From @jordigh on January 27, 2015 17:25

I am trying to use this to help some people at work compare CSV files. They have expressed desire to have row numbers in order to more clearly see which rows have changed. It can be difficult for a human reader to find the context rows.

Hunks in unified diffs have range information. Why is the range information not present in the highlighter diff format?

(Apologies if this has been discussed before, but I couldn't find such a discussion. Feel free to point me to it if it has.)

Copied from original issue: frictionlessdata/specs#162

danfowler commented 8 years ago

From @paulfitz on January 27, 2015 18:59

@jordigh it would definitely be good to add row numbers in highlighter diffs. In the implementations of the spec that I've done (daff and coopy) I've stuck in support for this in an ad-hoc way. For example daff has a --index flag that produces output like this:

selection_003

The first row and column are added by the --index flag and give the row/column in the "local" version and the row/column in the "remote" version of the table, separated by a colon. If there's a three-way comparison, with a parent, the cells are of the form N|N:N.

Do you think something like that would be adequate? It seems a bit weird to stick two integers into a single cell when they could just go in separate rows/columns. On balance I personally preferred it, but it'd be great to sort something out and add it to the spec. Do you have ideas?

danfowler commented 8 years ago

From @jordigh on January 27, 2015 19:9

Yes, that looks perfect! I do think it should be an extra pair of columns, rather than separated by colons... but I don't know what the implications of adding two columns on the left could be. This also seems to add another row at the very top. Wouldn't this break parsers? I don't know how stable the format is.

Does daff expose this feature in the HTML renderer?

danfowler commented 8 years ago

From @jordigh on January 27, 2015 19:18

As to my second question about HTML, I found it:

    flags.always_show_order = true;
    flags.never_show_order = false;
danfowler commented 8 years ago

From @paulfitz on January 27, 2015 19:26

Yes, there's a row at the top, giving column changes. There are none in the example I gave, but in general columns could get renamed or added/removed or moved, just like the rows. This would absolutely break parsers that expect "@@" in the top left. That is another argument for getting something like it in the spec so people have warning...

danfowler commented 8 years ago

From @jordigh on January 29, 2015 21:30

Thinking about this more, I think I much prefer what you call an ad-hoc way. Could this become part of the actual standard? With some syntax highlighting, the colon-separated numbers are very easy to read. I did patch them up a little in my client code to start counting rows from 1 and columns in base 26, since that's what spreadsheets look like.

danfowler commented 8 years ago

From @paulfitz on January 30, 2015 3:31

We could certainly take a shot at getting it in, all it takes is a PR. Suppose we stick with daff's syntax. You make an interesting point about using spreadsheet row/column numbering. Would it make sense to support that directly in the spec? Row/columns numbers are only really useful for spreadsheets in the first place, they aren't relevant to database tables for example, so it makes some sense to optimize for them. So you do the whole A-Z AA - ZZ ... thing on the columns?

danfowler commented 8 years ago

From @jordigh on February 2, 2015 16:45

Yeah, I convert column numbers to base 26. Let me save you a few seconds of thinking with this bit of code:

var alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
function base26(num) {
    if (num == "-") {
        return "-";
    }

    num = parseInt(num, 10);

    var out = "";
    do {
        out = alpha[num % 26] + out;
        num = Math.floor(num / 26) - 1;
    } while (num >= 0)

    return out;
}

I think A, B, ..., Z, AA, AB, AC, ... AZ, BA, BB, ... is what Excel does? It's certainly what LO and Gnumeric do.

So for putting this into the spec, could it be an optional bit of syntax, just like section names in hunk headers are optional in traditional unified diffs?

danfowler commented 8 years ago

From @paulfitz on February 4, 2015 16:21

Thanks @jordigh. This could certainly be optional in generation of diffs. Programs that do the equivalent of patch would then in general need to understand the syntax though. I think that is ok, if they are trying to factor explicit row/column numbers into patching then they are doing more than coopy/daff currently do and this will be a minor issue.

So then the proposal would perhaps be that:

Now, what about the counting-from-0 and counting-from-1 issue? This affects the row number. From your perspective, the most natural thing would be for the first row after the header to be row number 2, right? Again, databases don't have rows row numbers (well, there's oid/rowid etc but not the same thing), so I could persuade myself just to follow spreadsheet behavior slavishly.

danfowler commented 8 years ago

From @paulfitz on February 4, 2015 16:23

One approach is to say that row numbers / column numbers are defined by the source data, and just wash our hands of assigning any meaning to them.

danfowler commented 8 years ago

From @paulfitz on February 11, 2015 21:0

Started sketching out a spec https://github.com/paulfitz/dataprotocols/compare/row_numbers / http://paulfitz.github.io/dataprotocols/tabular-diff-format/#the-index-row-and-column. The spec contains a little twist that daff doesn't yet do, using the header row as a reference point for counting.

danfowler commented 8 years ago

From @rgrp on March 7, 2016 14:57

@paulfitz is this ready to be closed or should it stay open?

danfowler commented 8 years ago

From @paulfitz on March 7, 2016 17:20

@rgrp I implemented this in daff https://github.com/paulfitz/daff/pull/17 and it did prove handy. Forgot to propose an update to spec though - will do that asap, thanks for reminder.