Closed ErichDonGubler closed 4 years ago
I've added a description to this issue! :)
Looks like #45 handles hex, so I'll note that.
Using b
to mean block
could be confusing if we also add (explicit) support for byte
/B
, kilobyte
/kB
, etc. I'd rather go with block
as a unit.
@sharkdp: Okay, so when you say "b
" are you referring to the short name of the flag specifying block size, the unit used in a size-typed option, or both? I'm going to assume the second, but just wanted to make sure I'm not missing an ambiguity there. :)
Okay, so when you say "
b
" are you referring to the short name of the flag specifying block size,
no, using -b
to mean --block-size
is fine for me.
the unit used in a size-typed option
yes. I think we should reserve --length 64b
to mean "64 bytes" instead of "64 blocks" because we might also add --length 64kB
or --length 64kb
.
So, if I were to model the byte units with a regex, I'm thinking something like:
(?P<count>\d+)(?P<magnitude_unit>[kKmM]?)(?P<bits_or_bytes>[bB])
Questions/clarifications about the above:
magnitude_unit
(bikeshed please!) case-insensitive in my example, but that might not be desirable considering my next question.bits_or_bytes
because, in general b
vs. B
is normally a significant distinction. Do we care here?For block, I'm just thinking of accepting:
(?<count>\d+)blocks?
Questions about this one:
s
optional at the end (for grammar nerds like myself), or should we just stick to either s
or no s
?I've updated the OP with conservative requirements for now.
@sharkdp - it's been over a year, but would you be interested in a PR for some flavor of this feature?
I propose supporting roughly the same set of suffixes as GNU coreutils programs like head or dd:
Custom block sizes could be added as discussed above. I don't see much use adding bits as a unit since a hexdump is fairly byte-centric.
@aswild: I started working on this yesterday, I have a branch that I'll be sending as a PR either today or tomorrow hopefully.
ah cool, I wrote my own version in https://github.com/aswild/hexyl/commit/1c116b0be764080835b7911c69e77fcd8e4e0bfb and https://github.com/aswild/hexyl/commit/9191489ec2412554927111bdba15ae64936d4f81 too, just haven't squashed them to a PR branch
@aswild @ErichDonGubler Thank you very much for your work on this.
A few comments / questions:
mB
(would be parsed as "millibyte") or KB
(could be confused with Kelvin · Byte
). But for hexyl
, I guess it's fine.k
, M
, G
, and T
shorthand notation for multiples of 2^10? Isn't it really confusing that k = 2^10 byte
but kB = 10^3 byte
? Can we leave this notation out? Or is it too common in other Unix tools?-b
short option. I didn't think about this previously, but hexdump
has a -b
option that has a completely different meaning. Should we maybe keep -b
free for now?
- Allowing case-insensitive parsing seems okay to me, as there is no room for ambiguity (I think) - because we only care about the "byte" unit. In my Insect general purpose scientific calculator, I would not allow things like
mB
(would be parsed as "millibyte") orKB
(could be confused withKelvin · Byte
). But forhexyl
, I guess it's fine.
This doesn't seem worth discussing to death to me, so no pressure from me one way or another here. I did it because it was a straightforward thing to add and the user experience seemed to outweigh that of case sensitivity.
- How about the
k
,M
,G
, andT
shorthand notation for multiples of 2^10? Isn't it really confusing thatk = 2^10 byte
butkB = 10^3 byte
? Can we leave this notation out? Or is it too common in other Unix tools?
How about we pull this into a separate PR for discussion? To kickstart that discussion: We could defer in forwards-compatible ways. I see a few strategies that may actually be orthogonal approaches:
rustc
development for too long, heh...--gnu-units
flag or similar could be a way of parsing legacy unit specs across for people porting scripts to hexyl
-- that would allow hexyl
to forge its own path in the unit design space if we want to think of something more "intuitive"?I think in general hexyl
doesn't have very good diagnostics for its interface right now. Invalid units, for instance, get totally thrown away if they don't parse correctly, which I consider a subpar user experience at best. Mind if I open another issue about diagnostics in general?
- #90 proposes to add a
-b
short option. I didn't think about this previously, buthexdump
has a-b
option that has a completely different meaning. Should we maybe keep-b
free for now?
Sure, I can pull this out.
This doesn't seem worth discussing to death to me, so no pressure from me one way or another here. I did it because it was a straightforward thing to add and the user experience seemed to outweigh that of case sensitivity.
:+1:
* We could start with the most strict user interface and relax as we decide what's acceptable and give ourselves enough time to consider the design space? Perhaps I've been lurking on `rustc` development for too long, heh...
Sounds good to me. Let's leave these shorthand notations out for now.
Sure, I can pull this out.
:+1:
I think in general
hexyl
doesn't have very good diagnostics for its interface right now. Invalid units, for instance, get totally thrown away if they don't parse correctly, which I consider a subpar user experience at best. Mind if I open another issue about diagnostics in general?
Absolutely. I didn't know that and would consider it a bug. Looks like the error handling can be improved a lot.
Let's leave these shorthand notations out for now.
Sounds good. Did you want to create a separate issue for {k,m,g,t}
to be discussed?
Sounds good. Did you want to create a separate issue for
{k,m,g,t}
to be discussed?
I'd rather wait until someone complains that they are missing :smile:. I personally like it better without them.
closed via #90 by @ErichDonGubler
23
,1024
u64::from_str(...)
0x
.0x17
,0x100
u64::from_str_radix(...)
-b 512 -n 1block
N.B: one cannot use a block unit to define the block size.
block
when parsing numbers. Multiply by block size.B
at the end of the count, and can include an optional magnitudinal spec like kilobytes (K
) or megabytes (M
).23B
: 23 bytes9KB
: 9 kilobytes(?P<count>\d+)(?P<magnitude_unit>[KM]?)B
.Other open questions
-
or+
sign be supported?+
is useful --xxd
's manual states that for the-s
option+
is useful only forstdin
. Not sure what that means, though.