ssadler / hawk

Awk for Hoodlums
BSD 3-Clause "New" or "Revised" License
35 stars 2 forks source link

tab-separated columns #30

Open gelisam opened 11 years ago

gelisam commented 11 years ago

hawk currently outputs tuples as space-separated columns by default. I think tabs should be the default.

melrief commented 11 years ago

I agree but maybe not just tuples. Every container that contains more than one value should separate them with \t. For instance

> hawk -e '[(1, 2)]'
1\t2
> hawk -e '[["foo bar", "foo bar"]]'
foo bar\tfoo bar

A bonus is that you can print on the same line many tab-separated strings that are easy to parse:

> hawk -e '[["hello world","foo bar"]]'
hello world\tfoo bar
> hawk -e '[["hello world","foo bar"]]' | hawk -m "L.map (L.length . words) . split '\t'"
2\t2
melrief commented 11 years ago

Considering I have already done the code to do this for hsp, if you both agree I can push it

gelisam commented 11 years ago

I have concerns about allowing a list to represent a single line, as this will interact badly with --magic.

As you know, the whole point of --magic is to use the type of the user's expression to infer whether the expression is supposed to be a map...

> printf "hi\nworld" | hawk --magic B.length
2
5

a fold...

> seq 3 | hawk --magic (+)
6

or an operation on a list of lines:

> seq 10 | hawk --magic 'L.drop 7'
8
9
10

But if we allow lists to represent the content of a single line, then it is no longer clear whether 'L.drop 7' is supposed to drop the first 7 lines or, interpreted as a map, the first 7 columns.

I think the first interpretation is going to be more common, so if we do allow lists to represent lines, magic won't complain that the type is ambiguous, it should instead try to do the most-probably-correct thing:

> hawk --magic '[[1..3], [4..6], [7..9]]'
1   2   3
4   5   6
7   8   9
> hawk --magic '[1..3]'
1
2
3
> hawk --magic '1'
1

Therefore, I am slightly worried that if we teach the user that a list of lists represents lines of tab-separated items, then the user will expect to be able to use --map to omit the outer list, and if we allow that, then the user will expect --magic to detect that a map is expected, and that won't work. It's a slippery slope.

> printf "1\t2\t3\n4\t5\t6\n7\t8\t9" | hawk 'map L.reverse'
3   2   1
6   5   4
9   8   7
> printf "1\t2\t3\n4\t5\t6\n7\t8\t9" | hawk --map L.reverse
3   2   1
6   5   4
9   8   7
> printf "1\t2\t3\n4\t5\t6\n7\t8\t9" | hawk --magic L.reverse
7   8   9
4   5   6
1   2   3

Notice that the third attempt has a different behaviour than the first two.

ssadler commented 11 years ago

I prefer the distinction between lists and tuples, ie, lists items are separate lines, tuples are tab separated.

melrief commented 11 years ago

@gelisam this is a delicate topic. I see the imput of a unix command as a list of list of strings, where the first list is the list of lines and the second list is the list of words in a single line. So for me the example hawk --magic L.reverse has the correct output. But --magic can still be very useful, if not essential, when the user specify a correct expression:

> printf "1\t2\t3\n4\t5\t6\n7\t8\t9" | hawk 'L.map sum'
Couldn't match type `[[b0]]' with `Data.ByteString.Lazy.Internal.ByteString'
Expected type: Data.ByteString.Lazy.Internal.ByteString -> GHC.Types.IO ()
Actual type: [[b0]] -> GHC.Types.IO ()
> printf "1\t2\t3\n4\t5\t6\n7\t8\t9" | hawk --magic 'L.map sum'
6
15
24

For me, the user must understand the types involved in hawk. This is not optional and --magic can only relax this when it is clear what the user wants but won't delete this prerequisite. About the last example you did, magic cannot (and for me must not) infer automatically a map on a list of lists. The user must specify it:

> printf "1\t2\t3\n4\t5\t6\n7\t8\t9" | hawk --magic L.reverse
7   8   9
4   5   6
1   2   3
> printf "1\t2\t3\n4\t5\t6\n7\t8\t9" | hawk --magic 'L.map L.reverse'
3   2   1
6   5   4
9   8   7

I like this way to be honest. Eventually, a solution could be to let magic work with -d and -m:

> printf "1\t2\t3\n4\t5\t6\n7\t8\t9" | hawk --magic L.reverse
7   8   9
4   5   6
1   2   3
> printf "1\t2\t3\n4\t5\t6\n7\t8\t9" | hawk -m --magic L.reverse
3   2   1
6   5   4
9   8   7

@ssadler hsp shows tuples as lists but hsl considers tuples very different from lists. We should discuss more on this, maybe opening a new issue just about tuples to clarify use cases about them? The system that shows a type is very easy to change.

gelisam commented 11 years ago

I'm fine with the behaviour of the above 6 examples.

gelisam commented 11 years ago

How about tab-separated tuples, but whitespace-separated lists?