microsoft / ManagedEsent

MIT License
244 stars 36 forks source link

Exception flowing out of PersistentDictionary`2.Dispose(Boolean) prevents handle close and subsequent re-open? #41

Open rtestardi opened 4 years ago

rtestardi commented 4 years ago

Hi,

We have a test case where clustered storage is transiently failing, and then we try and close the persistent dictionary so we can re-open it once cluster storage is healed. The close throws an exception after (apparently) doing a portion of the close work, and never (apparently) gets to closing handles (maybe GC will eventually close them, in indeterminate time?). Then, trying to reopen the database fails because the database handles from the previous instance are still open.

We see the exception at close time looking like:

<EsentInstanceUnavailableException> This instance cannot be used because it encountered a fatal errorHResult: (0x80131500)
   PersistentDictionaryCursorCache`2.Dispose()
   PersistentDictionary`2.Dispose(Boolean)

And then our subsequent re-open fails with:

<EsentSystemPathInUseException> System path already used by another database instanceHResult: (0x80131500)
   Api.JetInit2(JET_INSTANCE&, InitGrbit)
   Instance.Init(InitGrbit)
   PersistentDictionary`2..ctor(String, IConfigSet, IEnumerable`1)
   PersistentDictionary`2..ctor(String, IConfigSet)

It seems the problem is here:

try
{
    this.disposeLock.EnterWriteLock();
    writeLocked = true;

    if (this.alreadyDisposed)
    {
        return;
    }

    this.cursors.Dispose();
    this.database.Dispose();
    this.instance.Dispose();
}
finally

If this.cursors.Dispose() throws (I am guessing that from the CursorCache in the stack above?), then the following lines never run, leaving handles open (until GC?)???

Shouldn't we be guaranteed that handles are immediately closed after dispose regardless of whether writes can be flushed to disk or not? So that the database can be reopened?

It seems our only option with the current implementation is to exit the whole process and have cluster manager restart it, which we'd prefer not to do.

Is there something we might be doing wrong here to be able to close and reopen our database when storage fails transiently, without exiting our process?

Thanks for any insights!

-- Rich (richardt@microsoft.com)