mbdavid / LiteDB

LiteDB - A .NET NoSQL Document Store in a single data file
http://www.litedb.org
MIT License
8.36k stars 1.22k forks source link

[BUG] LiteFileStream SetReadStreamPosition causes incorrect seeking #2458

Open CSDSP opened 3 months ago

CSDSP commented 3 months ago

Version Current master (commit 4f77be4cc096653472bf06ebcf95eb399b1a4ff5).

Describe the bug LiteFileStream.SetReadStreamPosition, part of how seeking is implemented, has a few bugs. First of all,

if (newPosition < 0 && newPosition > Length)
{
    throw new ArgumentOutOfRangeException();
}

This is clearly wrong, probably meant to be an or instead of an and. The effect is you can seek to positions below 0 or greater than length, and you get exceptions other than ArgumentOutOfRangeException.

Secondly, and more importantly for my purposes, is the block below:

// calculate new chunk position
 _currentChunkIndex = (int)_streamPosition / MAX_CHUNK_SIZE;
_positionInChunk = (int)_streamPosition % MAX_CHUNK_SIZE;

This assumes that all chunks aside from the last one are exactly MAX_CHUNK_SIZE bytes. However, that is not the case. If it is meant to be the case, then the bug is with writing, because various files in my database (uploaded with LiteFileStorage.Upload) have some randomly much smaller chunks.

CSDSP commented 3 months ago

On the write size, MemoryStream.Read isn't guaranteed to return the maximum available number of bytes. Currently, a chunk is written for every buffer read, so whenever it happens to read less then max chunk size bytes, it still writes a chunk. This could be fixed, though it would leave the read issue in place for all current databases.

alexbereznikov commented 3 months ago

@CSDSP @SpaceCheetah May I ask how this issue shows itself? We have some new db corruption issues after upgrading to v5.0.19 so just wanted to know whether your fix could help us :)

SpaceCheetah commented 3 months ago

@alexbereznikov It shows up when you try to seek in a file read stream. Just downloading directly to a file or copying to a memory stream works fine, but if you try to seek there's a chance depending on the file for it to seek to the wrong position. I noticed it when I was getting exceptions where it was trying to read more data from a chunk than actually existed.

Seeking to the beginning still works, seeking anywhere else in the file can fail if any chunks are smaller than max length, which for me is fairly common.

bnuzhouwei commented 2 months ago

The Seek get error, so the file can not be suport enableRangeProcessing in MVC file result. How to fix it.!

bnuzhouwei commented 2 months ago

It also cause that The LiteFileStream can't be use for read documents with stream by ITextSharp or other document libs. Hope this fix as soon as possible

JKamsker commented 3 weeks ago

Hello @CSDSP, you say there are a few bugs, can you please provide a repro unit test for the bugs you found?

SpaceCheetah commented 3 weeks ago

Created branch https://github.com/SpaceCheetah/LiteDB/tree/FailingTests Notably, PR #2459 does not yet fully pass these, and I'll probably be adding more. Here's the repro code for the main issue I've found:

[Fact]
public void SeekShortChunks()
{
    using var db = new LiteDatabase(":memory:");
    var fs = db.FileStorage;
    using(Stream writeStream = fs.OpenWrite("test", "test"))
    {
        writeStream.WriteByte(0);
        writeStream.Flush(); //Create single-byte chunk just containing a 0
        writeStream.WriteByte(1);
        writeStream.Flush();
        writeStream.WriteByte(2);
    }
    using Stream readStream = fs.OpenRead("test");
    readStream.Position = 2;
    Assert.Equal(2, readStream.ReadByte());
}
JKamsker commented 3 weeks ago

@SpaceCheetah Nice find! Can you add the tests to the pr then?

SpaceCheetah commented 3 weeks ago

@JKamsker I now have, and it does pass the tests I've added now.

JKamsker commented 3 weeks ago

@SpaceCheetah Uh nice, i will look into it asap :) Is the pr good to go from your side?

SpaceCheetah commented 3 weeks ago

@JKamsker Well, a better solution may be possible (if it's possible to read through the data chunks getting just lengths without having to read through all of it, for instance). This does fix the issue, though, as far as I can tell.