icsharpcode / SharpZipLib

#ziplib is a Zip, GZip, Tar and BZip2 library written entirely in C# for the .NET platform.
http://icsharpcode.github.io/SharpZipLib/
MIT License
3.73k stars 976 forks source link

`WriteEntry(TarEntry, bool)` writes the entry without bytes if the file is used by another process #819

Open HofmeisterAn opened 1 year ago

HofmeisterAn commented 1 year ago

Describe the bug

This issue makes it inconvenient to use SharpZipLib, because disposing the tar archive will throw a TarException that hides the underlying issue. When a file is used by another process, WriteEntry(TarEntry, bool) writes the entry, but obviously not the bytes (which is fine). Due to the dangling tar entry, disposing (leaving the using block) the tar archive throws:

Entry closed at '0' before the 'x' bytes specified in the header were written.

It is not clear which file is responsible for the error. Following line throws an IOException:

https://github.com/icsharpcode/SharpZipLib/blob/c51ef18c55988e3251b15f038ced1d9064b6da2f/src/ICSharpCode.SharpZipLib/Tar/TarArchive.cs#L891

if a file is used by another process, but a few lines before, the entry is already added:

https://github.com/icsharpcode/SharpZipLib/blob/c51ef18c55988e3251b15f038ced1d9064b6da2f/src/ICSharpCode.SharpZipLib/Tar/TarArchive.cs#L876

Reproduction Code

No response

Steps to reproduce

-

Expected behavior

WriteEntry(TarEntry, bool) fails with a clear error message that considers the context.

Operating System

Windows, macOS, Linux

Framework Version

.NET 7, .NET 6

Tags

Tar

Additional context

As workaround I check in advance if the process can read the file.

piksel commented 1 year ago

When a file is used by another process, WriteEntry(TarEntry, bool) writes the entry, but obviously not the bytes (which is fine).

That doesn't seem right. If the file is locked, it should throw an IO error when adding it. We don't have anything that catches such an error and proceeds without writing.

Perhaps whatever is using the file truncates it (setting the size to 0)?

HofmeisterAn commented 1 year ago

It throws an IOException first. Due to the exception, it leaves the using block and disposes the tar archiv, which throws TarException:

TarOutputStream.CloseEntryAsync(CancellationToken cancellationToken, Boolean isAsync)
TarOutputStream.FinishAsync(CancellationToken cancellationToken, Boolean isAsync)
TarOutputStream.Finish()
TarOutputStream.Dispose(Boolean disposing)
TarArchive.Dispose(Boolean disposing)
TarArchive.Dispose()

https://github.com/icsharpcode/SharpZipLib/blob/c51ef18c55988e3251b15f038ced1d9064b6da2f/src/ICSharpCode.SharpZipLib/Tar/TarOutputStream.cs#L398-L404

piksel commented 1 year ago

What using block are you referring to here? Wouldn't just putting a try { WriteEntry(...) } catch (IOException) { ... } work?

piksel commented 1 year ago

I think what you want is to be using TarOutputStream directly instead of using the TarArchive helper. It's meant for handling all the processing of files/paths etc and only requiring to be pointed at a directory. That also means that it cannot provide separate exceptions for IO errors (since they are opened internally and shouldn't be left open on errors).
Since you are manually iterating files (and fighting the automatic path naming), it's probably much easier to skip the helper.

See https://github.com/icsharpcode/SharpZipLib/wiki/GZip-and-Tar-Samples#-create-a-tar-or-tgz-with-control-over-filenames-and-data-source

HofmeisterAn commented 1 year ago

What using block are you referring to here? Wouldn't just putting a try { WriteEntry(...) } catch (IOException) { ... } work?

I am referring to the TarArchive instance.

I think what you want is to be using TarOutputStream directly instead of using the TarArchive helper.

Thanks, I will look into it.

It's meant for handling all the processing of files/paths etc and only requiring to be pointed at a directory. That also means that it cannot provide separate exceptions for IO errors (since they are opened internally and shouldn't be left open on errors).

I still think there is room for improvements. Even in the use case you mention here, the TarException hides the underlying exception. The exception is not helpful, developers do not understand what went wrong. The current implementation does not consider CA1065: Do not raise exceptions in unexpected locations too.

piksel commented 1 year ago

I still think there is room for improvements.

It should be possible to just move the tarOut.PutNextEntry(entry) inside the File.Open closure (and the directory case) in this case.