ironfede / openmcdf

Microsoft Compound File .net component - pure C# - netstandard 2.0
Mozilla Public License 2.0
297 stars 73 forks source link

OpenMcdf.CFCorruptedFileException: The file is corrupted. #67

Closed freezy closed 4 years ago

freezy commented 4 years ago

I'm getting the above message when writing data. I've spent a fair amount of debugging and am still clueless how this could happen.

First of all, it doesn't happen all the time. We're writing to hundreds of streams that work all fine, but sometimes it fails with this message (here a 190M file). The stack trace looks like this:

OpenMcdf.CFCorruptedFileException : The file is corrupted.
   at OpenMcdf.CompoundFile.EnsureUniqueSectorIndex(Int32 nextSecID, HashSet`1 processedSectors)
   at OpenMcdf.CompoundFile.GetFatSectorChain()
   at OpenMcdf.CompoundFile.GetNormalSectorChain(Int32 secID)
   at OpenMcdf.CompoundFile.GetSectorChain(Int32 secID, SectorType chainType)
   at OpenMcdf.CompoundFile.WriteData(CFItem cfItem, Byte[] buffer, Int64 position, Int32 offset, Int32 count)
   at OpenMcdf.CompoundFile.WriteData(CFItem cfItem, Int64 position, Byte[] buffer)
   at OpenMcdf.CompoundFile.WriteData(CFItem cfItem, Byte[] buffer)
   at OpenMcdf.CFStream.SetData(Byte[] data)
   at VisualPinball.Engine.IO.BiffData.WriteData(CFStorage gameStorage, HashWriter hashWriter) in VisualPinball.Engine\VisualPinball.Engine\IO\BiffData.cs:line 49
   at VisualPinball.Engine.VPT.Table.TableWriter.WriteImages() in VisualPinball.Engine\VisualPinball.Engine\VPT\Table\TableWriter.cs:line 105
   at VisualPinball.Engine.VPT.Table.TableWriter.WriteTable(String fileName) in VisualPinball.Engine\VisualPinball.Engine\VPT\Table\TableWriter.cs:line 40
   at VisualPinball.Engine.VPT.Table.Table.Save(String fileName) in VisualPinball.Engine\VisualPinball.Engine\VPT\Table\Table.cs:line 165
   at VisualPinball.Engine.Test.VPT.DebugTests.ShouldWriteChecksum() in VisualPinball.Engine\VisualPinball.Engine.Test\VPT\DebugTests.cs:line 13

The project is an open source pinball simulator, and it's kind of awkward to reproduce, but here we go nevertheless:

  1. Clone freezy/VisualPinball.Engine
  2. Download the test file writefail.vpx here
  3. Open DebugTests.cs and add this method:
    
    [Test]
    public void ShouldWrite()
    {
    var table = Engine.VPT.Table.Table.Load(@"path to writefail.vpx");
    table.Save( @"some other path");
    }
  4. Execute the test and verify it works
  5. Open ItemData.cs and remove string.Empty from EditorLayerName on line 24, setting its default value to null
  6. Re-run the test to reproduce the error

Thing is, it's just binary data. Setting that field to null just doesn't write it into a binary blob. How this could trigger such an error is beyond me, but I admit I don't entirely understand the compound doc format.

Any hints or suggestions would be greatly appreciated.

ironfede commented 4 years ago

Hi @freezy , I'm following your instruction to reproduce error but by using vpx you pointed me to I obtain this exception (BadImageFormatException) in file loading - sorry I've an Italian locale, I will switch to EN asap -

image

here's the inner exception stacktrace

in NetMiniZ.Interop.WinLibraries.wrapper_tinfl_decompressor_size() in NetMiniZ.Interop.WinLibraries.tinfl_decompressor_size() in NetMiniZ.NetMiniZUtils.Decompress(Stream inputStream, Stream outputStream) in VisualPinball.Engine.IO.BiffZlib.Decompress(Byte[] bytes) in C:\Users\Federico\Source\Repos\VisualPinball.Engine\VisualPinball.Engine\IO\BiffZlib.cs: riga 22 in VisualPinball.Engine.VPT.Primitive.BiffVerticesAttribute.Parse[T](T obj, BinaryReader reader, Int32 len) in C:\Users\Federico\Source\Repos\VisualPinball.Engine\VisualPinball.Engine\VPT\Primitive\PrimitiveData.cs: riga 206

and here the wrapping exception point

image

It looks like something doesnt'like to the Decompression routine while parsing a specific "tag".

Please, let me know if I'm doing something wrong or if you think that there's something dependent on my configuration/locale. I'm on Windows 10 64 bit, Visual Studio Community 2019 16.6.1

Kind Regards, Federico

freezy commented 4 years ago

Dude, you're awesome! Thanks for having a look at this!

I've just merged a PR that properly deals with x64/x86, and since the error you're having seems to be related to the (native) MiniZ library we're using, that might resolve the problem.

Can you pull master and try again? Sorry for the trouble!

ironfede commented 4 years ago

@freezy Now I'm able to reproduce the behaviour you described above. I'm trying to investigate the issue but it looks like something strange happens deep in openmcdf difat sectors... Just a question, do you have same issues using ver3 of compound files?

freezy commented 4 years ago

@ironfede glad you got it working!

To be honest I'm not even sure which version I'm using. Is it version 4 per default? Are they compatible?

ironfede commented 4 years ago

I could be wrong but I think you load a v4 file and save as v3. Now, this shouldn't be a problem but could possibly expose some underlying hidden issue. Default is v3 at the moment. Please, give me some time to investigate... Kind regards, Federico

freezy commented 4 years ago

Thanks a lot for having a look.

About v3 vs v4, you probably know better than I do. I simply instantiate a new CompoundFile here, and then add storages and streams to it (the CompoundFile instance from the source isn't used when writing).

The writefail.vpx example file was not created by OpenMcdf, if that helps.

ironfede commented 4 years ago

@freezy many thanks for reporting this issue . There was an issue in handling a limit condition while loading DIFAT sectors when exactly 508 bytes were written AND only one sector exists (I think that I really couldn't create such a 'perfect' test ;-) ) I've committed the fix but I need time to extensively recheck potential regressions from this change before publishing a nuget package. Please, try to build and reference the new library from source to verify your use case: it looks ok to me. Best Regards, Federico

freezy commented 4 years ago

Cheers! Will build and have a look later today.

Thanks again for your great support.

freezy commented 3 years ago

Apologies for the delay, I've written a test that loops through all the .vpx files I have and stumbled upon other bugs that needed fixing first.

The good news is that there's no corruption errors anymore! Bad news is I got stack overflow errors on two files (over 280mb) when writing. Will dig more into it, it's probably on my side. No stack trace unfortunately.

So that was about a hundred files (around 10gb).

ironfede commented 3 years ago

Ok. If you think there' are other issues, please open a new tracker and, if you can, give me a use case to test in order to speed up problem analysis. Kind regards, Federico

freezy commented 3 years ago

Yes, of course. Cheers!

On Mon, Jul 20, 2020, 15:41 Federico Blaseotto notifications@github.com wrote:

Ok. If you think there' are other issues, please open a new tracker and, if you can, give me a use case to test in order to speed up problem analysis. Kind regards, Federico

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ironfede/openmcdf/issues/67#issuecomment-661046904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARGGTU2OSAONA247F4TH3R4RCRPANCNFSM4O27RKLQ .

freezy commented 3 years ago

Thanks a lot for the release.

I've created freezy/VisualPinball.Engine#131 for the other issue and let you know if it's relevant to OpenMcdf.

ironfede commented 3 years ago

@freezy thank you for letting me know. Please notice that nuget pkg for openmcdf 2.2.1.6 is source link enabled to allow a deeper analysis and debug of suspected issues. Best regards, Federico