ulikunitz / xz

Pure golang package for reading and writing xz-compressed files
Other
484 stars 45 forks source link

limit reached error #19

Closed screeley44 closed 6 years ago

screeley44 commented 6 years ago

@ulikunitz - we are sporadically getting a limit reached error and wondering what are the possible reasons that this might be happening or if there is a way to increase initial setting of the N:? Maybe something we can do with the props or writerConfig to increase this limit? The other weird thing is that this only seems to happens on *.tar.xz files.

ulikunitz commented 6 years ago

Can you confirm that this error happens during compression? Your message doesn't tell me that. The error shouldn't occur and is very likely a bug. It hasn't been reported so far and did never occur in the tests I'm running before a release. Can you provide me with a single sample file of the data that triggers the error? That would allow me to reproduce the problem and analyze it.

jeffvance commented 6 years ago

We have our own issue tracking this problem with more details. See https://github.com/kubevirt/containerized-data-importer/issues/335, especially the comments near the end...

jeffvance commented 6 years ago

We are wondering if we should increase DictCap or Size fields?

copejon commented 6 years ago

Hi @ulikunitz Our sample data is generated on the fly for tests from a single know source (tinyCore.iso: https://github.com/kubevirt/containerized-data-importer/tree/master/tests/images).

In these instances where we see the limit reached error, our generator package tars and xz compresses tinyCore.iso for consumption in the test suite. The operations are performed using archive/tar and this project for xz.

The code that results in this error is a little long to paste in its entirety here so I'll snippet just the relative function (the package is at https://github.com/kubevirt/containerized-data-importer/blob/master/tests/utils/fileConversion.go). Hopefully it's just a matter of mis-configuring the writer on our part.

func toXz(src, tgtDir string) (string, error) {
        // createTargetFile opens a new file in the target directory, using the basename 
        // of src and appending ".xz" e.g. /path/to/tgtDir/srcBaseName.xz
        // it returns the *File and path
    tgtFile, tgtPath, err := createTargetFile(src, tgtDir, image.ExtXz)
    defer tgtFile.Close()

    w, err := xz.NewWriter(tgtFile)
    if err != nil {
        return "", errors.Wrapf(err, "Error getting xz writer for file %s", tgtPath)
    }
    defer w.Close()

    srcFile, err := os.Open(src)
    if err != nil {
        return "", errors.Wrapf(err, "Error opening file %s", src)
    }
    defer srcFile.Close()

    _, err = io.Copy(w, srcFile)
    if err != nil {
        return "", errors.Wrapf(err, "Error writing to file %s", tgtPath)
    }
    return tgtPath, nil
}
ulikunitz commented 6 years ago

Thanks. What file creates the "limit reached" error? I assume I can find it in tinyCore.iso.

It is not an configuration error. The XZ format has blocks that have a maximum size. The compressor has to stop before this maximum size is reached. ErrLimit ("limit reached") would indicate that this maximum size is reached without the encoder stopping before. Currently I don't know, how this can happen and for that reason I ask for the file that creates the error to reproduce the issue.

copejon commented 6 years ago

The file is tinyCore.iso.tar.xz that we generate on the fly from tinyCore.iso. Is that what you're referring to? If so, I'll create it with our generator package, upload it to a personal git repo and post the link.

jeffvance commented 6 years ago

@ulikunitz Thanks for your help and interest in this! At a high level, we take a base file, tinyCore.iso and convert it on-the-fly to various tar'd and compressed formats, and use these converted files to test that we can reverse the process with our CDI importer code. The limit error occurs intermittently (but is not rare) only when we are using xz.Writer as the Writer in the io.Copy call where the src file is the just-converted tinyCore.iso.tar. So tinyCore.xz works and tinyCore.tar works, but tinyCore.tar.xz occasionally fails. This occurs in our Travis CI environment, although Jon has seen the error in his local dev environment (but this is much rarer).

ulikunitz commented 6 years ago

Please provide me the link to the tar file. I will look at it at the weekend.

copejon commented 6 years ago

@ulikunitz Here is a .tar that we create with archive/tar as well as the .xz that we generate from that

https://s3.amazonaws.com/cdi-xz-debug/tinyCore.iso.tar https://s3.amazonaws.com/cdi-xz-debug/tinyCore.iso.tar.xz

Thanks for your interest in helping us debug this!

ulikunitz commented 6 years ago

Hi, I tested against the master and dev branch and I don't get any error for the tar file. Are you sure that this file generates the issue? Apparently you were able to create an xz file out of it.

Can you send me the output of go version? I want to check whether this is specific for a go version.

copejon commented 6 years ago

Okay, thanks for giving it a shot. We see the error maybe 1/20 builds, so it's not a constant event. It's difficult to provide the specific .tar that results in the limit reached error since it's created dynamically by our CI and destroyed almost immediately when the test ends. We don't have access to the CI environment to reach in an grab it either.

[root@b260faf6ac3f /]# go version
go version go1.10 linux/amd64
ulikunitz commented 6 years ago

Thank you. I let the issue open. I will spend some hours on reviewing the code around "limit reached" to find the problem. The file creating the error would be however extremely helpful.

copejon commented 6 years ago

@ulikunitz I can provide the docker container and/or Dockerfile in which this error occurs. It would probably be helpful since it provides the identical environment. Running the container in a for loops until it errors would be one way to catch it.

ulikunitz commented 6 years ago

I would run it, if the files causing the error would be written into a specific directory.

ulikunitz commented 6 years ago

I've made a change to the dev branch that should minimize the risk of an ErrList. Can you please check whether it prevents the error from reoccuring?

copejon commented 6 years ago

@ulikunitz We'll pull in the branch to our test env and see if we can reproduce the error. Thank you!

ulikunitz commented 6 years ago

I have not received any feedback for 17 days now, There has never been any file produced that is actually triggering the reported behavior. I have never observed this behavior myself, neither has anybody else reported it. Since there is nothing I can do right now, I'm closing the issue.

dellgreen commented 6 years ago

I am seeing this issue today 100% of the time with the following file https://drive.google.com/file/d/19eMvQaalynVlGj4PtuGKXhRg4WTkcilH/view?usp=sharing

the following is the command i use gxz -k -c -9 rootfs-sumo-fslc-mx6.tar > rootfs-sumo-fslc-mx6.tar.xz

strangely other files don't have the problem

ulikunitz commented 6 years ago

Many thanks for the file. I could reproduce the issue for the master branch. Please test the dev branch (commit d1e248f). You may use it as a workaround.

dellgreen commented 6 years ago

Cool thanks, I'll give it a test on Monday. Many thanks :)

ulikunitz commented 6 years ago

I increased the margin constant in master. I can compress the sample file now without an issue and released the fix as v0.5.5. Please report to me, whether it works for you.

dellgreen commented 6 years ago

Have re-ran the test this morning and all is working now, many thanks :)