ssadler / hawk

Awk for Hoodlums
BSD 3-Clause "New" or "Revised" License
35 stars 2 forks source link

--eval by default? #47

Closed gelisam closed 11 years ago

gelisam commented 11 years ago

Hsp's default (and now hawk's) was to treat stdin as a ByteString, while hsl was treating stdin as a [ByteString]. Which one do we want?

I suggest to support both, defaulting to "-e".

> hawk [1..3]
1
2
3
> echo -n "hello" | hawk --stream 'B.length'
5
> seq 3 | hawk --lines 'L.reverse'
3
2
1
gelisam commented 11 years ago

Wait, -d already does what I suggest for --lines.

But wouldn't it make more sense if -d only changed the delimiter, while -m or -f would determine whether we auto-map or auto-split?

melrief commented 11 years ago

+1

I like the idea of using -e as default (we should remove -e and say that hawk default mode is evaluat) and -d only for delimiter.

The last example is so frequent (splitting each line in words) that we could add another mode -w --words that takes a function f :: [ByteString] -> a and applies it line by line with each line splitted in words. It should be a shortcut for hawk -m "f . words". For example hawk -w L.head. Note that if we want to add this, we should define another delimiter that splits by word (not necessarily space). For example -D.

I don't know exactly how much those modes are useful, because they are just shortcuts for haskell functions. But, to be honest, I prefer

> hawk -w L.head

to

> hawk -l "L.map (L.head . words) . lines"

even if the second one is more "haskellian".

gelisam commented 11 years ago

+1

If we want to configure two different kinds of delimiters, could we use -d for word delimiters and something else for line-delimiters? For compatibility with cut.

I'm not sure which flag we could use to change the line-delimiter, though. What do other tools use?

– Samuel

On 2013-08-13, at 5:20 AM, Mario Pastorelli notifications@github.com wrote:

+1

I like the idea of using -e as default (we should remove -e and say that hawk default mode is evaluat) and -d only for delimiter.

-s --stream that takes a function f :: ByteString -> a and applies it to the entire input. For example hawk -s length -l --lines that takes a function f :: [ByteString] -> a and applies it to the entire input. It is a shortcut for hawk -s "f . lines". For example hawk -l L.reverse -m --map that takes a function f :: ByteString -> a and applies it line by line. It is a shortcut for hawk -l "L.map f". For example hawk -m 'L.head . words' The last example is so frequent (splitting each line in words) that we could add another mode -w --words that takes a function f :: [ByteString] -> a and applies it line by line with each line splitted in words. It should be a shortcut for hawk -m "f . words". For example hawk -w L.head. Note that if we want to add this, we should define another delimiter that splits by word (not necessarily space). For example -D.

I don't know exactly how much those modes are useful, because they are just shortcuts for haskell functions. But, to be honest, I prefer

hawk -w L.head to

hawk -l "L.map (L.head . words) . lines" even if the second one is more "haskellian".

— Reply to this email directly or view it on GitHub.

gelisam commented 11 years ago

I looked it up: perl uses $/, while awk uses RS.

Now that we have a field separator, do you think changing the line separator is going to be very common? If not, it could be one of those flags which don't have a one-letter abbreviation.

gelisam commented 11 years ago

Oh, I just noticed you suggested -D. Works for me!

melrief commented 11 years ago

I have added this to the first public release. I want to fix the options so that the user experience won't change in future.

melrief commented 11 years ago

I can work on this but first we should consider merging @gelisam 's branch qualify_bytestring into merge because it changes many important functions in Hawk.hs. This issue depends on issue #40 . What do you think?

gelisam commented 11 years ago

merged.

gelisam commented 11 years ago

But that branch was still not importing the Prelude by default, #40 is not done yet

melrief commented 11 years ago

True but the part about user expression type was another issue, you just put it there because it was related I guess.

gelisam commented 11 years ago

Ah! It's quite confusing to have #40 refer to two separate problems, I will try to split my issues into smaller parts in the future.

melrief commented 11 years ago

Ok this was huge. I don't close this because I would like to have your feedback. On my side it is working very well:

> ps aux | hawk -w '!! 1'
...
gelisam commented 11 years ago

Looks good!

Could we exchange the -D and -d flags, though? I expect that changing the word separator from space to tabs is going to be a lot more common than changing the line delimiter to something else.

Speaking of tabs, I noticed that consecutive delimiters are being collapsed into one. That's typically fine for spaces, but for other delimiters this behaviour is quite surprising:

> printf "1\t\t2\n3\t4\t5\n6\t\t7" | hawk -D'\t' -w 'reverse'
2   1
5   4   3
7       6

whereas I would have expected this:

> printf "1\t\t2\n3\t4\t5\n6\t\t7" | hawk -D'\t' -w 'reverse'
2       1
5   4   3
7       6
melrief commented 11 years ago

Yes for -D and -d and yes for collapsing the delimiter. Probably the default splitter should be on every character that match isSpace.

melrief commented 11 years ago

Fixed the problem with filter empty strings when the word delimiter is defined. Kept the behaviour with space. In future I propose to think about a more powerful way to split

melrief commented 11 years ago

and fixed the -D and -d stuff: now -d is for word delimiter that probably happens more often then change the lines delimiter. I close this issue because the --eval is now the default mode.

gelisam commented 11 years ago

Youhoo! I should be done with my other thing tonight, so I'll soon be able to close some issues myself.

melrief commented 11 years ago

The first release is not so far :-D...

ssadler commented 11 years ago

So default is not lines anymore? I still havn't run into a case where I didn't want line input :z.

gelisam commented 11 years ago

Correct, the default is now to evaluate the expression, ignoring the input. But it's not too late to change it! I also think I would process lines most of the time.

I had suggested --eval to be the default while working on its --help message. I was trying to come up with a short way of explaining that this mode ignores the input, and I realized that I was thinking in terms of the implementation instead of the user-visible behaviour. The --eval option stood out in that all the other modes process the input in some way, while --eval just evaluates the expression. Making it the default made it stand out a lot less.

melrief commented 11 years ago

I think the idea of using evaluation as default mode is more user friendly because in this way if you want to work on the stream from stdin you can decide how. It gives to the user a complete control over the stream processing. Some examples:

Note that, while all modes can be expressed using just -s , the three most useful modes are eval (default), lines (-l) and words (-w) in my opinion. Modes are just a shorten way to do some kind of transformation, you are not forced to use them, in particular if you don't need them. Also awk has many ways to define transformations over the stream and not the lines.