litedb-org / LiteDB

LiteDB - A .NET NoSQL Document Store in a single data file
http://www.litedb.org
MIT License
8.67k stars 1.25k forks source link

[BUG] Corrupt / damaged / unreadable records in table #1742

Open DanTwomey opened 4 years ago

DanTwomey commented 4 years ago

Version LiteDB v5.0.7 .NET Framework 4.8 Windows

Describe the bug

We have encountered a strange situation where it would appear that part of the one table in our LiteDB has become corrupt/unreadable for reasons unknown.

Our database has 7 tables, 6 can be read and written to fine, the other 1 cannot be read, it however seems to be able to be written to just fine.

From our application logs, I know that we should have 14 records in the table, however we are unable to read any data from it, via our own applications and via the LiteDB Studio (v1.0.1.0). Both throw the same exception as below.

I was able to export data to JSON from LiteDB Studio, however there were only 4 records exported and as I said, there should be 14.

Is there a way this data can be recovered/repaired?

Do you know what may have caused this exception and therefore loss of data if it is not recoverable?

Expected behavior Return all data in the table

Screenshots/Stacktrace

System.NotSupportedException: BSON type not supported
   at LiteDB.Engine.BufferReader.ReadElement(HashSet`1 remaining, String& name)
   at LiteDB.Engine.BufferReader.ReadDocument(HashSet`1 fields)
   at LiteDB.Engine.BufferReader.ReadElement(HashSet`1 remaining, String& name)
   at LiteDB.Engine.BufferReader.ReadDocument(HashSet`1 fields)
   at LiteDB.Engine.BufferReader.ReadElement(HashSet`1 remaining, String& name)
   at LiteDB.Engine.BufferReader.ReadArray()
   at LiteDB.Engine.BufferReader.ReadElement(HashSet`1 remaining, String& name)
   at LiteDB.Engine.BufferReader.ReadDocument(HashSet`1 fields)
   at LiteDB.Engine.DatafileLookup.Load(PageAddress rawId)
   at LiteDB.Engine.DatafileLookup.Load(IndexNode node)
   at LiteDB.Engine.BasePipe.<LoadDocument>d__6.MoveNext()
   at LiteDB.Engine.BasePipe.<Filter>d__8.MoveNext()
   at LiteDB.Engine.QueryPipe.<Select>d__2.MoveNext()
   at LiteDB.Engine.QueryExecutor.<>c__DisplayClass10_0.<<ExecuteQuery>g__RunQuery|0>d.MoveNext()
   at LiteDB.BsonDataReader..ctor(IEnumerable`1 values, String collection)
   at LiteDB.Engine.QueryExecutor.ExecuteQuery(Boolean executionPlan)
   at LiteDB.Engine.QueryExecutor.ExecuteQuery()
   at LiteDB.Engine.LiteEngine.Query(String collection, Query query)
   at LiteDB.SharedEngine.Query(String collection, Query query)
   at LiteDB.LiteQueryable`1.<ToDocuments>d__26.MoveNext()
   at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
   at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   at System.Linq.OrderedEnumerable`1.<GetEnumerator>d__1.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
lbnascimento commented 4 years ago

@DanTwomey Could you send me your datafile?

DanTwomey commented 4 years ago

@lbnascimento We would love to send you the datafile to assist. Unfortunately we are unable to share it as it contains corporate information from a company within the government sector. We are happy to host a remote session at some point in time for you to access the file providing that is something you would be willing to do however we have managed to narrow it down to the 5th record (not that that means anything to you without the data file) and remove it via a DELETE command and can get the rest of the data from the table.

Is there a way we can dump all the data from the table etc so we could see the problematic data?

lbnascimento commented 4 years ago

@DanTwomey You could use the Debug mode in the latest version of LiteDB.Studio (it's a bit hard to use if you don't know much about the structure of a LiteDB datafile though). And you could run select $ from $dump(pageId) to get info and a binary dump of a specific page.

DanTwomey commented 4 years ago

I'm going to add to this as the bug report is still valid.

We have had a second case of a document in a collection becoming "corrupt/damaged" rendering the whole collection unreadable by normal means.

From what I have found, both times we have had damaged documents it has come down to updating one or both of two properties within the document. Both of these properties are of nullable DateTime public DateTime? prop_name {get; set;}. This doesn't happen with every document and it definitely doesn't happen all the time. Multiple development machines have not seen this issue but a client running on a customer's live system has, resulting in the collection being unreadable. We would then need to take that database file and figure out which one is the bad document, remove it and then return it to the customer.

Background/Context

The "job" object that makes up a collection of jobs (the collection that becomes unreadable) has multiple DateTime fields being updated at different times.

public class job
{
    ...
    other properties
    ...
    public DateTime created { get; set; }
    public DateTime? completed_datetime { get; set; } = null;
    public DateTime last_save { get; set; }
    ...
    other properties
    ...
    public DateTime? deleted_datetime { get; set; } = null;
    public DateTime? delete_when { get; set; } = null;
}

The vast majority of the fields are only ever updated by the WPF client. However, a user is able to perform a "delete" action which doesn't actually delete the document, it changes a status property and updates the delete_when field. At which point, the Windows Service that is running in the background can pick it up and further process that job. Once processed, it will update the status once again and populate the deleted_datetime property.

Problem Solving Steps

I have downloaded the LiteDB project from the master and spent a couple of days trying to figure out what the issue is and how we could recover data that was initially unreadable.

Importing the LiteDB project into my solution and being able to step through each section as the document gets read in has been quite valuable as I was able to recover data from the damaged document.

The BSON type not supported exceptions were being thrown in BufferReader.cs#L488.

Stepping through this, five additional "types" were being returned by var type = this.ReadByte(); which were not being handled, resulting in throw new NotSupportedException("BSON type not supported");

The additional types returned were as follows:

Dec - Hex 0 - 0x00 5 - 0x05 52 - 0x34 115 - 0x73 116 - 0x74

This was just part of the problem, once I had handled these within the function, just returning null for these particular types I ended up finding the DateTime type 0x09 was also causing an issue.

It would appear that the properties deleted_datetime and delete_when were being read twice, once with a "broken" value and once with a correct value.

The "broken" values were well above the ts being read in below and were then causing an exception during the BsonValue.UnixEpoch.AddMilliseconds(ts)

else if (type == 0x09) // DateTime
{
    var ts = this.ReadInt64();

    // catch specific values for MaxValue / MinValue #19
    if (ts == 253402300800000) return DateTime.MaxValue;
    if (ts == -62135596800000) return DateTime.MinValue;

    var date = BsonValue.UnixEpoch.AddMilliseconds(ts);

    return _utcDate ? date : date.ToLocalTime();
}

Corrupt ts value read from document deleted_datetime 104734198035640081 delete_when 3512174320820416273

Actual ts value also being read from document (after corrupt ones) deleted_datetime 1598117039633 delete_when 1598549039633

I modified the MinValue and MaxValue return lines to use >= and <= instead of ==, this stopped the exceptions and allowed the document to be read

if (ts >= 253402300800000) return DateTime.MaxValue;
if (ts <= -62135596800000) return DateTime.MinValue;

I don't know exactly what's going on here or why as I don't understand the structure of the LiteDB file but hopefully you can make sense of the information above and it's somewhat useful to you. Unfortunately, we still can't provide the datafile as it contains sensitive information, even in the damaged document. Trying to remove the properties with sensitive data and update the document with the above modifications inadvertently fixes it as I guess the values are being written back correctly, I don't know.

If there's any more info you need or clarification etc just say and I'll do my best to provide it.

curtisshipley commented 3 years ago

@DanTwomey Do you know what was leading to the corruption?

superware commented 3 years ago

I'm also experiencing this issue with "BSON type not supported" after what appears to be some random data corruption, any news?

sqbitz commented 1 year ago

We are seeing this in one of our LiteDB applications too. We are also seeing the issue around nullable DateTime fields if that helps. I think I've got a corrupted database backed up somewhere. I'll upload it as soon as I find it.

sqbitz commented 1 year ago

Here is a copy of the database we're using. I removed all the collections except for where the problem actually lies (ProjectItem). Application.zip

Just do a SELECT $ FROM ProjectItem; and you'll get the BSON type not supported error. I really hope this issue can be fixed. We see this issue crop up around once a month randomly and it completely screws up our data.

new-php commented 2 weeks ago

I encountered a similar issue with corrupted or unreadable records in a MySQL table, and here's how I tackled it to get things back on track: ### Steps to Fix Corruption

Identifying the Problem:
I ran CHECK TABLE on the affected table to confirm the corruption. In my case, the issue was related to damaged index trees.

Repairing the Table:
The REPAIR TABLE command worked for simpler issues:

REPAIR TABLE your_table_name;  

If the repair failed, I exported the data, dropped the table, and recreated it using a clean structure, then re-imported the data.

Preventative Measures:
    Adjusted database settings in my.cnf for better stability:
        innodb_buffer_pool_size for memory optimization.
        innodb_file_per_table=ON to isolate table issues.
    Scheduled regular maintenance tasks like OPTIMIZE TABLE and CHECK TABLE for early detection of issues.

Backup and Restore Process

Since data loss can be critical, I ensured robust MySQL backup and restore workflows:

Backups: Using mysqldump before making any changes:

mysqldump -u root -p database_name > backup.sql

Restore: Restoring from the backup if things went wrong:

mysql -u root -p database_name < backup.sql Improvements After Fix

To avoid future issues, I migrated the setup to a high-performance environment with a reliable hosting provider that ensured better database handling and uptime. I also ensured my MySQL installation was fully optimized for production loads.

These steps helped me resolve the table corruption and prevented further issues down the line. Regular backups and routine maintenance are key to database health.

tjmoore commented 1 week ago

@new-php - this is LiteDB, not MySQL