ironfede / openmcdf

Microsoft Compound File .net component - pure C# - netstandard 2.0
Mozilla Public License 2.0
295 stars 72 forks source link

CompoundFile looses storages in very rare cases #90

Closed szimmer-dap closed 1 year ago

szimmer-dap commented 1 year ago

Hi there,

I think there is a bug in CFStorage's Delete() method. Basically, whenever a storage is deleted, the DirEntry.Child is set to the tree's Root before the element is deleted from the RBTree. In all other scenarios I could find, the Child property is set after the operation on the tree (which seems more correct).

In cases where

any CompoundFile created from that stream will be loosing some storages, because the new tree will take a subnode from the original tree as new root instead of the original's root.

Code to reproduce:

[TestMethod]
public void Test_BUG_CompoundFile_Delete_Storages()
{
    using (var compoundFile = new CompoundFile())
    {
        var root = compoundFile.RootStorage;
        var storageNames = new HashSet<string>();

        // add 99 storages to root
        for (int i = 1; i <= 99; i++)
        {
            var name = "Storage " + i;
            root.AddStorage(name);
            storageNames.Add(name);
        }

        // remove storages until tree becomes unbalanced and its Root changes
        var rootChild = root.DirEntry.Child;
        var newChild = rootChild;
        var j = 1;
        while (newChild == rootChild && j <= 99)
        {
            var name = "Storage " + j;
            root.Delete(name);
            storageNames.Remove(name);
            newChild = ((DirectoryEntry)(root.Children.Root)).SID; // stop as soon as root.Children has a new Root
            j++;
        }

        // check if all remaining storages are still there
        foreach (var storageName in storageNames)
        {
            Assert.IsTrue(root.TryGetStorage(storageName, out var storage)); // <- no problem here
        }

        // write CompundFile into MemoryStream... 
        using (var memStream = new MemoryStream())
        {
            compoundFile.Save(memStream);

            // ....and read new CompundFile from that stream
            using (var newCf = new CompoundFile(memStream))
            {
                // check if all storages can be found in to copied CompundFile
                foreach (var storageName in storageNames)
                {
                    Assert.IsTrue(newCf.RootStorage.TryGetStorage(storageName, out var storage)); //<- we will see some missing storages here
                }
            }
        }
    }
}

As written above, the problems seem to reside in OpenMcdf.CFStorage.Delete(String entryName), where this

case StgType.StgStorage:

    CFStorage temp = new CFStorage(this.CompoundFile, ((IDirectoryEntry)foundObj));

    // This is a storage. we have to remove children items first
    foreach (IRBNode de in temp.Children)
    {
        IDirectoryEntry ded = de as IDirectoryEntry;
        temp.Delete(ded.Name);
    }

    // ...then we need to rethread the root of siblings tree...
    if (this.Children.Root != null)
        this.DirEntry.Child = (this.Children.Root as IDirectoryEntry).SID;
    else
        this.DirEntry.Child = DirectoryEntry.NOSTREAM;

    // ...and finally Remove storage item from children tree...
    this.Children.Delete(foundObj, out altDel);

    // ...and remove directory (storage) entry

    if (altDel != null)
    {
        foundObj = altDel;
    }

    this.CompoundFile.InvalidateDirectoryEntry(((IDirectoryEntry)foundObj).SID);

    break;

should probably look more like this:

case StgType.StgStorage:

    CFStorage temp = new CFStorage(this.CompoundFile, ((IDirectoryEntry)foundObj));

    // This is a storage. we have to remove children items first
    foreach (IRBNode de in temp.Children)
    {
        IDirectoryEntry ded = de as IDirectoryEntry;
        temp.Delete(ded.Name);
    }

    // ...then we Remove storage item from children tree...
    this.Children.Delete(foundObj, out altDel);

    // ...after which we need to rethread the root of siblings tree...
    if (this.Children.Root != null)
        this.DirEntry.Child = (this.Children.Root as IDirectoryEntry).SID;
    else
        this.DirEntry.Child = DirectoryEntry.NOSTREAM;

    // ...and remove directory (storage) entry

    if (altDel != null)
    {
        foundObj = altDel;
    }

    this.CompoundFile.InvalidateDirectoryEntry(((IDirectoryEntry)foundObj).SID);

    break;

In other words: always update DirEntry.Child after an operation on the corresponding Children tree.

aunverdorben-dap commented 1 year ago

Hi there,

I've added the unit test and the implementation fix as laid out by @szimmer-dap above.

ironfede commented 1 year ago

Thnaks to @aunverdorben-dap for fix