sonjeheon / lz4

Automatically exported from code.google.com/p/lz4
0 stars 0 forks source link

Make argument handling compatible with tar #74

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
lz4c could, with some changes, be used in conjunction with tar(1) to produce 
.tar.lz4 (or .tlz4) compressed archives.  This would be done with 'tar -I lz4c 
-[ctx]f ...' etc., but in order for that to work, lz4c needs to be 
argument-compatible with gzip/bzip2/xz/etc.

The attached patch makes these changes:

* -y => -f
* -c => -z
* -c1 => -9
* -h => -9
* Change: -c to use stdout
* Change: -h for short help
* Add new -v to control verbosity, and make non-error DISPLAY calls dependent 
thereon
* Make unlz4 (symlink) synonymous with lz4c -d
* Make lz4cat (symlink) synonymous with lz4c -cd
* If no files are specified, use stdin/stdout
* Accept "-" as synonym for stdin/stdout

The -c => -z change would require a patch for the Linux kernel; I am prepared 
to propose a backwards-compatible patch there should this be accepted.

Original issue reported on code.google.com by yselkow...@gmail.com on 17 Jul 2013 at 5:09

Attachments:

GoogleCodeExporter commented 9 years ago
Well, it seems the kernel is using -c1 right now, which makes it harder to make 
the patch backwards-compatible.  If you're willing to accept this patch, we'll 
have to check with kernel devs to see if they need the compatibility, or if 
they're willing to require a rev with this patch.

Original comment by yselkow...@gmail.com on 17 Jul 2013 at 5:22

GoogleCodeExporter commented 9 years ago
Thanks Yaakov

While being argument-compatible with zlib seems overall a pretty good idea,
(and btw I'm very likely to accept your proposed modifications which do not 
break compatibility with existing lz4 scripts)
is that really a necessity ? 
After all, it seems the command line with tar and zlib or lz4 could be built on 
purpose.
What would justify complete argument compatibility ?

Original comment by yann.col...@gmail.com on 17 Jul 2013 at 9:02

GoogleCodeExporter commented 9 years ago

Original comment by yann.col...@gmail.com on 17 Jul 2013 at 9:02

GoogleCodeExporter commented 9 years ago
OK, I looked into this a little further.  The following modifications are the 
minimum requirements for compatibility with tar(1):

* -d to decompress (already done)
* no filenames implies stdin/stdout
* no non-error messages

The attached patch does that without change the meaning of any existing flags.  
It seems to work correctly with "tar -I lz4c -[ctx]f".  It also allows for 
lz4cat and unlz4 aliases to match zcat/bzcat/xzcat/etc. and 
gunzip/bunzip/unxz/etc.

If you accept this patch, the next step will be to patch tar(1) to recognize 
lz4-compressed archives automatically; I'll try to keep you posted.

(That being said, I still think that it would be best to be fully 
command-compatible with gzip, as other archivers are, and that now would be a 
preferable time to do that, BEFORE lz4 starts hitting the distros, as it may 
once Linux 3.11 is out.  Perhaps that is best left for a separate enhancement 
request.)

Original comment by yselkow...@gmail.com on 19 Jul 2013 at 6:18

Attachments:

GoogleCodeExporter commented 9 years ago
Yes, it's a great patch. It gets all benefits.

For the second point (full zlib compatibility, breaking compatibility with 
current lz4c), I agree it would be a better choice to track it into a separate 
enhancement request.

Original comment by yann.col...@gmail.com on 19 Jul 2013 at 8:21

GoogleCodeExporter commented 9 years ago

Original comment by yann.col...@gmail.com on 19 Jul 2013 at 8:21

GoogleCodeExporter commented 9 years ago
Hello Yaakov

I'm currently finding some time to implement your proposed patch.
On average it looks good.
I've got some small questions though.

1) You seem to prefer disabling the following check :
if (argc<2) { badusage(exename); return 1; }

Could you please describe 
in which valid circumstance is argc supposed to be <2 ?

2) Likely related to above :
The following check is also removed :
-    // No input filename ==> Error
-    if(!input_filename) { badusage(exename); return 1; }

Same question, in which valid circumstance is this supposed to happen ?
I guess it's related to unlz4 & lz4cat commands, but I'm struggling to 
understand why command line arguments are not provided. (and also why these 
names would be provided, do you intend to compile the program under different 
names ? or is this how links work ?)

I guess that, the final expected behavior is that, if no argument is provided, 
then the application defaults to stdin as input, and stdout as output.

As a consequence, if someone just types the name of the application, with 
nothing else, it does no longer receive a help message as before, but instead 
output a rather messy :
♦"M↑dp╣    ♣]╠☻Compressed 0 bytes into 15 bytes ==> 1.#J%

I guess it might be possible to also handle this case by detecting a 0-byte 
stdin input, and redirecting towards badusage(). Just a bit more complex to do.

Original comment by yann.col...@gmail.com on 23 Jul 2013 at 12:48

GoogleCodeExporter commented 9 years ago
You'll find in attached file
an attempt at integrating most of your patch changes
while keeping a few key properties of the original lz4c.

The most important difference is that,
if the user just types the name of the application,
instead of getting nothing,
it will be directed towards bad_usage() help message.
As a consequence, should the application receives an stdin input which is 
empty, instead of generating a valid LZ4S output describing an empty stream, it 
does not generate anything, and instead output an help message into stderr.

Another difference is that the utility is not completely silent.
The number of lines displayed by default is greatly reduced,
but not tamed to zero.
I feel it is important to inform users with basic information, such as a 
successful compression/decompression and the size of the output.

However, when the application is called by using "unlz4" or "lz4cat", it is 
completely silent.

Everything else should be identical to your proposed patch.

Let me know if these modifications do not mess with your attempt at getting 
lz4c "gzip compatible" with tar.

Regards

Original comment by yann.col...@gmail.com on 23 Jul 2013 at 4:23

Attachments:

GoogleCodeExporter commented 9 years ago
My patch, save for the two added lines mentioning unlz4/lz4cat, *is* the bare 
minimum required for tar compatibility.  tar uses pipes with the compressor, so 
no filenames are passed to it, only sometimes the -d flag (for tar -t and tar 
-x):

compression:
tar -I lz4c -cf foo.tar.lz4 [files]
really means:
tar -c [files] | lz4c > foo.tar.lz4

decompression/testing:
tar -I lz4c -[tx]f foo.tar.lz4
really means:
lz4c -d < foo.tar.lz4 | tar -[tx]

So dealing with no filename arguments (which means allowing argc<2, AND 
assuming stdin/stdout) is essential for compatibility with tar.

This *also* allows unlz4 and lz4cat (which would take at most one argument); 
those would be symlinks to lz4c, just as e.g. gunzip and zcat are symlinks to 
gzip.  

The quieting of all messages without -v is also a must, as non-error messages 
would corrupt the display of e.g. tar -I lz4c -tf or tar -I lz4c -[ctx]vf.  
Note that gzip/bzip2/xz/etc. are also silent by default.

As for issuing a bare lz4c command, I hadn't thought of that, but you are 
correct that it needs to be handled.  Revised patch attached.

Original comment by yselkow...@gmail.com on 23 Jul 2013 at 10:19

Attachments:

GoogleCodeExporter commented 9 years ago
Perhaps I wasn't clear that the previous rc version was supporting "pipe as 
default" (no argument) as proposed by your patch, and was compatible with the 
example command lines you proposed.

The main difference was how to handle an erroneous usage of the command line 
utility. I was modifying the behavior towards isatty() when I received your new 
patch, and since I feel your proposal is much cleaner, I integrated it into the 
new rc.

The other, more minor, difference, is about displaying information on screen.
Quite frankly, I do dislike utilities which do not provide any feedback to the 
user. Keeping the user in the dark just because it makes my code simpler is not 
something I would advocate.

In the interest of ensuring a common behavior with past utilities, I've 
disabled all output when lz4c is used in conjonction with tar. The heuristic is 
fairly simple : the silence mode is automatically enabled when input or output 
is redirected to stdin or stdout. (maybe using tar+lz4c requires both input And 
output to be redirected ?)

I've tested your command line, and they work on my test system (an Ubuntu 
64-bits). There is no output at all. It's not exactly my taste. I feel the 
default user would have preferred to know the amount of bytes saved during the 
compression operation.

Original comment by yann.col...@gmail.com on 24 Jul 2013 at 8:58

GoogleCodeExporter commented 9 years ago
forgot the attached file...

Original comment by yann.col...@gmail.com on 24 Jul 2013 at 8:58

Attachments:

GoogleCodeExporter commented 9 years ago
minor update, reducing information displayed by default,
and modifying help content.

Original comment by yann.col...@gmail.com on 24 Jul 2013 at 12:57

Attachments:

GoogleCodeExporter commented 9 years ago
Integrated into r99

Original comment by yann.col...@gmail.com on 27 Jul 2013 at 11:29