mbdavid / LiteDB

LiteDB - A .NET NoSQL Document Store in a single data file
http://www.litedb.org
MIT License
8.52k stars 1.24k forks source link

[BUG] OrderBy call may return "LiteDB ENSURE: buffer size must be PAGE_SIZE" #2300

Closed matsakiv closed 1 year ago

matsakiv commented 1 year ago

Version LiteDb: 5.0.16 OS: MacOS Darwin (Kernel 21.2.0) .NET: 5.0.17

Describe the bug When trying to call OrderBy while reading data from the collection, an error occurs with the message LiteDB ENSURE: buffer size must be PAGE_SIZE. Database uses password/encryption and the number of records in collection is large (over 10000 entries).

It looks like the problem is that in some cases the SortService class can use multiple containers and use a Stream File in method Insert. By default container size is 8192*100=819200 bytes. But if encryption is used then in the AesStream.Write method there is a hard restriction that the size of the incoming buffer must be exactly equal to the PAGE_SIZE=8192.

If the number of records multiplied by the size of the sorting key exceeds the size of the container (819200) then SortService is guaranteed to use the file system and an error occurs.

This is the location of the call to write to disk in SortService:

if (_done.Running == false && _containers.Count == 1)
{
    container.InitializeReader(null, _buffer, _pragmas.UtcDate);
}
else
{
    // store in disk
    container.Position = _disk.GetContainerPosition();

    _disk.Write(container.Position, _buffer); // <= call sort disk write here

    container.InitializeReader(_reader.Value, null, _pragmas.UtcDate);
}

This results in a call to the Write method of the SortDisk class:

public void Write(long position, BufferSlice buffer)
{
    var writer = _pool.Writer;

    // there is only a single writer instance, must be lock to ensure only 1 single thread are writing
    lock(writer)
    {
        writer.Position = position;
        writer.Write(buffer.Array, buffer.Offset, _containerSize); // <= here writer can be AesStream, _containerSize = 819200
    }
}

And in the end it leads to a fall in AesStream:

public override void Write(byte[] array, int offset, int count)
{
    ENSURE(count == PAGE_SIZE, "buffer size must be PAGE_SIZE"); // <= Fail, count = 819200, but PAGE_SIZE = 8190
    ENSURE(this.Position % PAGE_SIZE == 0, $"AesWrite: position must be in PAGE_SIZE module. Position={this.Position}, File={_name}");

    _writer.Write(array, offset, count);
}

if I understand correctly, it can be fixed by the multiplicity requirement containerSize % PAGE_SIZE == 0 and multiple writer calls for each block of size PAGE_SIZE.

StackTrace from Sentry logs:

System.Exception: LiteDB ENSURE: buffer size must be PAGE_SIZE
  ?, in void Constants.ENSURE(bool conditional, string message)
  ?, in void AesStream.Write(byte[] array, int offset, int count)
  ?, in void SortDisk.Write(long position, BufferSlice buffer)
  ?, in void SortService.Insert(IEnumerable<KeyValuePair<BsonValue, PageAddress>> items)
  ?, in IEnumerable<BsonDocument> BasePipe.OrderBy(IEnumerable<BsonDocument> source, BsonExpression expr, int order, int offset, int limit)+MoveNext()
  ?, in IEnumerable<BsonDocument> BasePipe.Include(IEnumerable<BsonDocument> source, BsonExpression path)+MoveNext()
  ?, in IEnumerable<BsonDocument> QueryPipe.Select(IEnumerable<BsonDocument> source, BsonExpression select)+MoveNext()
  ?, in BsonDataReader QueryExecutor.ExecuteQuery(bool executionPlan)+RunQuery()
  ?, in new BsonDataReader(IEnumerable<BsonValue> values, string collection)
  ?, in BsonDataReader QueryExecutor.ExecuteQuery(bool executionPlan)
  ?, in IBsonDataReader LiteEngine.Query(string collection, Query query)
  ?, in IEnumerable<BsonDocument> LiteQueryable<T>.ToDocuments()+MoveNext()
  ?, in List<TResult> SelectEnumerableIterator<TSource, TResult>.ToList()

Code to reproduce

using var db = new LiteDatabase("Filename=fail.db;Password=12345678");

var collection = db.GetCollection("test");

var docsCount = 60000; // 819200 / 14 =~ 58 514 records, where 14 is size of 'Date' key

var docsToUpsert = new List<BsonDocument>();

for (var i = 0; i < docsCount; ++i)
{
    docsToUpsert.Add(new BsonDocument
    {
        ["Date"] = DateTime.UtcNow
    });
}

collection.Upsert(docsToUpsert);

var docs = collection.Query()
    .OrderBy("Date")
    .Offset(0)
    .Limit(20)
    .ToList();
mbdavid commented 1 year ago

Hi @matsakiv, thanks for your investigation and PR to fix. I beleave there is no problem with Aes algothims works with multiples of PAGE_SIZE.

I will made same tests, but in this case, I almost sure that this "ENSURE" are wrong. Just remove can works fine. I made this lot of ENSURE to test internal conditions and better track bugs. Sometime this ENSURE can be wrong for same specific case.