paulfitz / daff

align and compare tables
https://paulfitz.github.io/daff
MIT License
791 stars 67 forks source link

option to set line endings #80

Closed semio closed 7 years ago

semio commented 7 years ago

From RFC for CSV and Tabular Data Package Definition, we can see CRLF as well as LF should be allowed in CSV files. For now daff will only output CSV files with CRLF line endings, which may cause problems when working with files using LF. (For example, a program I am working on breaks recently because it assume LF line endings but they changed to CRLF after git merge)

I think it should be good to have options for setting line endings, either allowing user to choose or automatically add eol according to the operation system. What do you think?

paulfitz commented 7 years ago

Yes, I think allowing the user to choose would be very sensible. I wanted the default to pass CSV linters, but CRLF is definitely a bit of a pain in practice.

semio commented 7 years ago

Great, good to know about this. At the meantime, do you know work arounds for fixing the eol after git merge? For now we have to do manual fix after merge or make an other commit, like this one https://github.com/semio/ddf--gapminder--systema_globalis/commit/caf1c89290ac0bc15fa4d8cec79c5fce7a671873

jheeffer commented 7 years ago

Hi Paul!

I work with Semio at Gapminder, where we promote a fact-based world view using data-viz to make the facts understandable : ). We use daff in git to merge all kinds of datasources to create one big dataset describing the world in time series. We love daff and we use it with joy! So, to reiterate on the problem described by @semio :

daff merge outputs csv's with CRLF, according to the RFC. This confuses git in linux, because it expects LF, and now there's an extra CR character before the LF

code: https://github.com/paulfitz/daff/blob/053d14c79f4d52c18ce3651dfd08d842f4681086/coopy/Csv.hx#L54 CRLF in csv RFC: https://tools.ietf.org/html/rfc4180#section-2

options as we see them now:

So, on the line-ending support of daff, I guess it'd be best if daff would copy git behaviour: use the line endings that were originally in the files before daff merge. That way, whatever is set in gitattributes for line ending handling will be observed (e.g. linux with crlf forced).

What do you think of this specific way of handling? I'd love to help, but my knowledge of haxe is kind of non-existent.

paulfitz commented 7 years ago

@jheeffer @semio using the same line-endings on output as discovered in input seems like a good default behavior to me. Your use-case is definitely one I want to support. Working on this now in #81. I understand that haxe makes the barrier to entry for contributions higher than it otherwise might be.

jheeffer commented 7 years ago

Looks good! So if I get it correctly, the eol of the first line of one of the two files will be used? I guess that's a good enough solution, if it's clear which file the eol's are used from (if they conflict). No need to do it on a per line basis most likely.

paulfitz commented 7 years ago

Yes, there's also an --eol crlf / --eol lf option now to manually specify behavior if desired. Let me know if the automatic behavior proves insufficient - I stuck with the simplest solution for now but happy to elaborate if needed.