thrau / jarchivelib

A simple archiving and compression library for Java
https://github.com/thrau/jarchivelib
Apache License 2.0
198 stars 36 forks source link

Add Archiver.stream(ArchiveStream) support #51

Open netwolfuk opened 7 years ago

netwolfuk commented 7 years ago

Debian archive files (.deb) are an "ar" file containing a set of tar.gz files.

I'd love to be able to pass the ArchiveStream from the "ar" into a new Archiver.stream() call to extract a file from the embedded tar.gz.

I'm not familiar enough with Streams to know if that would work. Archiver.stream() appears to take a File only.

Thoughts?

thrau commented 7 years ago

i've never thought much about nesting archive streams, it's certainly not possible in any way with the current API.

but it sounds like a fun thing to implement. on first glance i think this will involve some wrapping mechanism of the stream in CommonsArchiveEntry. it also means extending ArchiveEntry. i have some time on my hands tomorrow, i'll have a look at it.

thrau commented 7 years ago

actually, this should work

Archiver arArchiver = ArchiverFactory.createArchiver("ar");
Archiver tarGzArchiver = ArchiverFactory.createArchiver("tar", "gz");

ArchiveStream stream = arArchiver.stream(new File("/home/thomas/bar.ar"));

ArchiveEntry entry;
while ((entry = stream.getNextEntry()) != null) {
    if (entry.getName().endsWith(".tar.gz")) {
        tarGzArchiver.extract(stream, new File("/tmp/")); // will extract the contents of the nested archive
        // will close the stream! see #52
    }
}

stream.close();

the problem in the current version is that Archiver.extract(InputStream, File) will close the input stream. therefore any subsequent calls to the stream will throw an IOException. you can extract one file. which is pretty stupid i admit.

i'll fix #52 and deploy a snapshot

thrau commented 7 years ago

you can use 0.8.0-SNAPSHOT to try it

netwolfuk commented 7 years ago

Wow, thanks for the speedy response. I've been thinking about this over the weekend realised now that my request was very poorly worded. Inside the second stream, I am streaming out the file to a ByteArrayOutputStream, which means I may not even need to go near the filesystem. I know I did say "extract" but I am really streaming it.

My current code looks like this:

    @Rule
    public TemporaryFolder folder = new TemporaryFolder();
    @Test
    public void getControlFileAsStringTest() throws IOException {
        File controlTarGz = getControlTarGzFromDeb(new File("src/test/resources/build-essential_11.6ubuntu6_amd64.deb"),
                folder.getRoot());
        String controlFileContents = getControlFromControlTarGz(controlTarGz);
        System.out.println(controlFileContents);
    }

    public File getControlTarGzFromDeb(File debFile, File tmpLocation) throws IOException {

        Archiver archiver = ArchiverFactory.createArchiver(ArchiveFormat.AR);
        ArchiveStream stream = archiver.stream(debFile);
        ArchiveEntry entry;

        File controlTarGzFile = null;

        while((entry = stream.getNextEntry()) != null) {
            // access each archive entry individually using the stream
            // or extract it using entry.extract(destination)
            // or fetch meta-data using entry.getName(), entry.isDirectory(), ...
            System.out.println(entry.getName());
            if (entry.getName().equals("control.tar.gz")){
                controlTarGzFile = entry.extract(tmpLocation);
            }
        }
        stream.close();

        return controlTarGzFile;
    }

    public String getControlFromControlTarGz(File controlTarGzFile) throws IOException {
        Archiver archivertgz = ArchiverFactory.createArchiver(ArchiveFormat.TAR, CompressionType.GZIP);
        ArchiveStream stream = archivertgz.stream(controlTarGzFile);
        ArchiveEntry entry;
        ByteArrayOutputStream baos= new ByteArrayOutputStream();

        while((entry = stream.getNextEntry()) != null) {
            if (entry.getName().equals("./control")){
                IOUtils.copy(stream, baos);
            }
        }

        return baos.toString( StandardCharsets.UTF_8.toString() );
    }

Ultimately, it would be really nice to stream from the entry. Something like this:

    public String getControlStringFromArFile(File arFile) throws IOException {
        Archiver archiverAr = ArchiverFactory.createArchiver(ArchiveFormat.AR);
        Archiver archivertgz = ArchiverFactory.createArchiver(ArchiveFormat.TAR, CompressionType.GZIP);
        ArchiveStream stream = archiverAr.stream(arFile);
        ArchiveEntry entry, entry2;
        ByteArrayOutputStream baos= new ByteArrayOutputStream();

        while((entry = stream.getNextEntry()) != null) {
            // The ar contains a tgz file named control.tar.gz
            if (entry.getName().equals("control.tar.gz")) {
                ArchiveStream stream2 = archivertgz.stream(entry);
                while((entry2 = stream2.getNextEntry()) != null) {
                    //The control.tar.gz contains a text file named control 
                    if (entry2.getName().equals("./control")){
                        IOUtils.copy(stream2, baos);
                    }
                }
            }
        }

        return baos.toString( StandardCharsets.UTF_8.toString() );
    }

In the mean time, I'll test out your changes.

thrau commented 7 years ago

the point of jarchivelib is to make it convenient to handle archives as File objects. for what you are trying to achieve, i would suggest using commons-compress directly 1, as they already have an excellent archive/compression stream API, which jarchivelib only makes use of.

netwolfuk commented 7 years ago

Thanks Thomas for taking the time to respond. I will look into that. It's just that your API is much nicer to deal with ;-) Sorry to have wasted your time. I do appreciate your help thus far.

abarsov commented 7 years ago

The thing that is completely missing in commons-compress is restoring unix file mode while extracting archive from stream. ZipFile resolve file mode from archive and in this library a good job was done for restoring file mode with help of FileModeMapper class. At the same time restoring unix file mode from ZipArchiveInputStream won't be possible, because that simply skip entire Central Directory Record. This way it seems own re-implementation of ZipArchiveInputStream is required. Being implemented that would be really great feature, I haven't managed to find any library that could extract from stream and restore unix file mode at the same time. (The main idea here is to iterate through entities and then after all entities finished, to parse additional files information from Central Directory Record. That information might be used for restoring file mode even after all files were extracted already)

thrau commented 7 years ago

thanks for the input! I've been mulling over file permissions for a while, and they're tricky because a) not all archive formats support them properly, which makes it hard to generalize. b) java's support for portable file permissions isn't very good, and will require resorting to hacks like the FileModeMapper. i have some ideas for a major release, but the API will be completely new, and i don't plan to fiddle with them in 0.x.x