mbdavid / LiteDB

LiteDB - A .NET NoSQL Document Store in a single data file
http://www.litedb.org
MIT License
8.62k stars 1.25k forks source link

[SUGGESTION] Improve query optimization #1706

Open mikhail-khalizev opened 4 years ago

mikhail-khalizev commented 4 years ago

Version 5.0.8

Describe the bug It's not really bug. But suggestion for query optimization. In code below I start 2 equivalent queries. In first condition is Where(x => x.Id >= low && x.Id <= high). In second condition flips: Where(x => x.Id <= high && x.Id >= low).

But time of execution those queries differ in 2 times!

Code to Reproduce

public class MyTests
{
    private ITestOutputHelper _output;

    public MyTests(ITestOutputHelper output)
    {
        _output = output;
    }

    public class StateModel
    {
        [BsonId]
        public long Id { get; set; }
    }

    [Fact]
    public void MyTest()
    {
        // Init.

        using var db = new LiteDatabase(":memory:");
        var col = db.GetCollection<StateModel>("col");

        var count = 50000;
        var low = count / 2 - 100;
        var high = count / 2 + 100;
        col.InsertBulk(Enumerable.Range(1, count).Select(x => new StateModel {Id = x}));

        // Act.

        var sw1 = new Stopwatch();
        var sw2 = new Stopwatch();

        for (var i = 0; i < 10; i++)
        {
            var query1 = col.Query()
                .Where(x => x.Id >= low && x.Id <= high)
                .OrderByDescending(x => x.Id)
                .Limit(10);

            var plan1 = query1.GetPlan().ToString();

            sw1.Restart();
            var list1 = query1.ToList();
            sw1.Stop();

            _output.WriteLine(
                $"Condition 'x.Id >= low && x.Id <= high' elapsed {sw1.ElapsedMilliseconds}ms with plan {plan1}.");

            var query2 = col.Query()
                .Where(x => x.Id <= high && x.Id >= low) // Flip condition.
                .OrderByDescending(x => x.Id)
                .Limit(10);

            var plan2 = query2.GetPlan().ToString();

            sw2.Restart();
            var list2 = query2.ToList();
            sw2.Stop();

            _output.WriteLine(
                $"Condition 'x.Id <= high && x.Id >= low' elapsed {sw2.ElapsedMilliseconds}ms with plan {plan2}.");

            list1.Select(x => x.Id).Should().Equal(list2.Select(x => x.Id));
        }

        // Check.

        (sw1.ElapsedMilliseconds > 1.5 * sw2.ElapsedMilliseconds || sw2.ElapsedMilliseconds > 1.5 * sw1.ElapsedMilliseconds).Should().BeFalse("Bad optimization.");
    }
}

Expected behavior I expect that time of queries execution to be no more than 5% different.

Additional context

Test Output:


Expected (sw1.ElapsedMilliseconds > 1.5 * sw2.ElapsedMilliseconds || sw2.ElapsedMilliseconds > 1.5 * sw1.ElapsedMilliseconds) to be false because Bad optimization., but found True.
   at FluentAssertions.Execution.XUnit2TestFramework.Throw(String message)
   at FluentAssertions.Execution.TestFrameworkProvider.Throw(String message)
   at FluentAssertions.Execution.DefaultAssertionStrategy.HandleFailure(String message)
   at FluentAssertions.Execution.AssertionScope.FailWith(Func`1 failReasonFunc)
   at FluentAssertions.Execution.AssertionScope.FailWith(Func`1 failReasonFunc)
   at FluentAssertions.Execution.AssertionScope.FailWith(String message, Object[] args)
   at FluentAssertions.Primitives.BooleanAssertions.BeFalse(String because, Object[] becauseArgs)
   at LiteDB.Tests.Database.MyTests.MyTest() in C:\Users\micky\Documents\Source\LiteDB\LiteDB.Tests\Database\MultiKey_Mapper_Tests.cs:line 83

Condition 'x.Id >= low && x.Id <= high' elapsed 130ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 41ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id >= low && x.Id <= high' elapsed 77ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 39ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id >= low && x.Id <= high' elapsed 93ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 42ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id >= low && x.Id <= high' elapsed 85ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 40ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id >= low && x.Id <= high' elapsed 78ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 44ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id >= low && x.Id <= high' elapsed 81ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 43ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id >= low && x.Id <= high' elapsed 78ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 47ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id >= low && x.Id <= high' elapsed 91ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 49ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id >= low && x.Id <= high' elapsed 80ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 41ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id >= low && x.Id <= high' elapsed 82ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id >= 24900)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id<=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
Condition 'x.Id <= high && x.Id >= low' elapsed 41ms with plan {"collection":"col","snaphost":"read","pipe":"queryPipe","index":{"name":"_id","expr":"$._id","order":-1,"mode":"INDEX SCAN(_id <= 25100)","cost":20},"lookup":{"loader":"document","fields":"$"},"filters":["($._id>=@p1)"],"limit":10,"select":{"expr":"$","all":false}}.
lbnascimento commented 4 years ago

@mikhail-khalizev In LiteDB, only one index can be used per query. All other filter expressions are evaluated after the results are retrieved, even if the expression uses indexes fields. If there are multiple clauses eligible for indexed search, the query execution pipeline tries to select the one that will give the best performance. In this scenario, it probably evaluates both clauses equally, and always ends up selecting the first one (the left clause).

I ran a slightly modified version of your test: I converted the Linq expressions to Bson expressions and also added a third query that uses the expression $._id between @0 and @1 (which is not available through Linq expressions).

What I noticed is that queries that use only Limit perform better when Id >= low is the index expression, and the reverse is true for queries that use only Offset. And the query that uses between outperformed both by a significant margin – which is to be expected, given that its filter is executed at once (index mode: INDEX RANGE SCAN(_id BETWEEN 24900 AND 25100)).

mikhail-khalizev commented 4 years ago

Wow. Then I suggest make optimization - if LiteDb meets Where(x => x.Id >= low && x.Id <= high) it replace it to between.

mikhail-khalizev commented 4 years ago

Also it possible optimize

Where(x => x.Id > low && x.Id < high)

as

($._id between @0 and @1 and $._id > low and $._id < high).