Closed daviddassau closed 4 years ago
Yeah, the library does not care about the contents, so I am not sure what is going on. Are you processing the lines somehow? Are you splitting the rows on "|" and then assigning the values to the corresponding header keys or something like that? It does seem outside the scope of the library, but if you could provide the code you are using I can take I look.
@piksel thank you so much for replying! I would be more than happy to supply you with some of my code. Hopefully it will give you a better idea of what the issue may be. For reference, I always get the error on the first line of the try
block: BZip2.Decompress(fileToDecompressAsStream, decompressedStream, true);
private static void DecompressBZ2File()
{
string bz2FilePath = $"C:\\temp\\PandoraData\\pandoraData.txt.bz2";
string txtFilePath = @"C:\temp\PandoraData\pandoraData.txt";
FileInfo zipFileName = new FileInfo(bz2FilePath);
using (FileStream fileToDecompressAsStream = zipFileName.OpenRead())
{
using (FileStream decompressedStream = File.Create(txtFilePath))
{
try
{
BZip2.Decompress(fileToDecompressAsStream, decompressedStream, true);
Console.WriteLine("Successfully decompressed BZ2 file!");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
}
Okay, this should have nothing to do with the contents (header vs value count). Rather, it would seem like the bz2 format in the file is either incorrect, or incorrectly read by the library. Could you provide the full stacktrace of the error?
It's in ex.StackTrace
if you're not debugging through visual studio.
@piksel Sure thing! Here's what came out of the catch block when I set the Console.WriteLine(ex.StackTrace);
Index was outside the bounds of the array.
at ICSharpCode.SharpZipLib.BZip2.BZip2InputStream.RecvDecodingTables() in C:\projects\sharpziplib\src\ICSharpCode.SharpZipLib\BZip2\BZip2InputStream.cs:line 466
at ICSharpCode.SharpZipLib.BZip2.BZip2InputStream.GetAndMoveToFrontDecode() in C:\projects\sharpziplib\src\ICSharpCode.SharpZipLib\BZip2\BZip2InputStream.cs:line 579
at ICSharpCode.SharpZipLib.BZip2.BZip2InputStream.InitBlock() in C:\projects\sharpziplib\src\ICSharpCode.SharpZipLib\BZip2\BZip2InputStream.cs:line 379
at ICSharpCode.SharpZipLib.BZip2.BZip2InputStream..ctor(Stream stream) in C:\projects\sharpziplib\src\ICSharpCode.SharpZipLib\BZip2\BZip2InputStream.cs:line 112
at ICSharpCode.SharpZipLib.BZip2.BZip2.Decompress(Stream inStream, Stream outStream, Boolean isStreamOwner) in C:\projects\sharpziplib\src\ICSharpCode.SharpZipLib\BZip2\BZip2.cs:line 27
at StreamingUsageConsole.Services.Pandora.GetObjectTest.DecompressBZ2File(String pathAndFileName, String jsonFile) in C:\NaxosRepos\utility-data-streamingdatadownload\StreamingUsageConsole\Services\Pandora\GetObjectTest.cs:line 128
That's odd. That row just initializes an array. I have no idea how that could be throwing an IndexOutOfRangeException
. Are you using the nuget package? What is your environment?
Yes, I am most definitely using the nuget package. Here are my using statements that are currently being utilized:
using System;
using System.IO;
using System.Threading.Tasks;
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using ICSharpCode.SharpZipLib.BZip2;
Regarding my environment, I'm using Visual Studio 2019, coding with .Net Framework 4.7.2. I'm definitely willing to share more of my code, as well as the downloaded .bz2 file, if it would potentially be helpful to you. I am 100% grateful for all the help you've given me thus far, though!
I really have to go to bed, but if you could provide the bzip2 file I would probably have enough to reproduce. I'll take a look at it as soon as I can.
@piksel Ok thank you so much for your help! Let me know if you have any issues downloading/viewing this file. I had to zip it up, in order to upload to Github. naxos_US_2019-07-07.txt.zip
fwiw, I gave this a quick go with the latest source and the above file and didn't see any exception, but the extracted file was tiny (only included the headers I think). (7-Zip pulled a lot more out of it).
Could this be related to multi-stream BZip2 files (and the lack of support for them)?
@Numpsy I'm running into the same issue as well, in regards to it successfully extracting, but only containing the headers.
That definitely sounds like the issue. It would also make sense that a file like this would use multistreams. I found an interesting wrapper in a blog post: https://chaosinmotion.blog/2011/07/29/and-another-curiosity-multi-stream-bzip2-files/ The same approach should be possible to do with BZip2InputStream. Adding support for it in the library shouldn't be too hard either, but that's not a short term solution.
The (somewhat old) issue #162 references that same blog post.
Yeah, it was literally the first google result :D
That totally works!
SharpZipLib Issue 413
Decompressing WITHOUT MultiStreams:
Successfully decompressed BZ2 file.
Output file size: 206 byte(s) (1 line(s))
Decompression time: 0.041s
Decompressing WITH MultiStreams:
Successfully decompressed BZ2 file.
Output file size: 172220713 byte(s) (969524 line(s))
Decompression time: 16.279s
Source: https://gist.github.com/piksel/7ade2571713b992e4c532a93385067f8
I am currently working on a PR to fix this inside Bzip2InputStream instead.
Sorry I'm a little late getting back to the conversation. Thank you both so much for taking a look into this issue for me! I was, however, able to find a workaround that utilizes 7-Zip. All I had to do was reference the 7zip .exe from my project, call the Command Prompt, and pass in a single argument with the BZ2 file and where exactly it should be decompressed to. And it worked! However, I am looking forward to seeing the solution you come up with @piksel . Thank you once again!
Steps to reproduce
Expected behavior
When running
BZip2.Decompress(fileToDecompressAsStream, decompressedStream, true);
, it should obviously be writing the data from the.bz2
file to the.txt
file.Actual behavior
I'm getting an error stating
Index was outside the bounds of the array
I know that this issue is more of a "bad data" problem, rather than a problem with SharpZipLib. However, I was hoping you could help with finding a solution. Ideally, I would like to decompress the .bz2 file, and either remove the extra header column or give all the rows a
NULL
value. But I can't find a way to do this. Any help you could provide would be very much appreciated!Version of SharpZipLib
1.2.0
Obtained from (only keep the relevant lines)