redis / NRedisStack

Redis Stack .Net client
MIT License
225 stars 37 forks source link

Wrong query when aggregating with filter #230

Closed lem0na closed 6 months ago

lem0na commented 9 months ago

Thanks for reporting an issue in NRedisStack! Please update the appropriate text below, as much data as possible really helps!

NRedisStack Version: 0.11.0

Redis Stack Version: redis_version:7.2.3 from official docker image MODULE LIST name timeseries ver 11009 path /opt/redis-stack/lib/redistimeseries.so args name redisgears_2 ver 20014 path /opt/redis-stack/lib/redisgears.so args v8-plugin-path /opt/redis-stack/lib/libredisgears_v8_plugin.so name search ver 20809 path /opt/redis-stack/lib/redisearch.so args MAXSEARCHRESULTS 10000 MAXAGGREGATERESULTS 10000 name bf ver 20608 path /opt/redis-stack/lib/redisbloom.so args name RedisCompat ver 1 path /opt/redis-stack/lib/rediscompat.so args name ReJSON ver 20607 path /opt/redis-stack/lib/rejson.so args

Description: I am trying to evaluate Redis for aggregations. I saved a few JSON records and tried to aggregate using filter on one of the properties. The generated query throws an exception that a property is not loaded or in the pipeline.

example code

using NRedisStack.Search;
using NRedisStack;
using StackExchange.Redis;
using NRedisStack.RedisStackCommands;
using NRedisStack.Search.Literals.Enums;
using RedisTest3;
using NRedisStack.Search.Aggregation;
using NRedisStack.Literals.Enums;
using static NRedisStack.Search.Query;
using System.Text.RegularExpressions;

string session = "user";

ConfigurationOptions options = new ConfigurationOptions
{
    EndPoints = {
            { "localhost", 6379 },
        },
};

ConnectionMultiplexer cluster = ConnectionMultiplexer.Connect(options);
IDatabase db = cluster.GetDatabase();
SearchCommandBuilder.DropIndex("idx:users", dd: true);
db.Execute("FLUSHALL");

ISearchCommands ft = db.FT();
IJsonCommands json = db.JSON();

var schema = new Schema()
    .AddNumericField(new FieldName("$.UserId", "UserId"), sortable: true)
    .AddTagField(new FieldName("$.CreatedDay", "CreatedDay"))
    .AddNumericField(new FieldName("$.StatusId", "StatusId"), sortable: true);

ft.Create(
    "idx:users",
    new FTCreateParams().On(IndexDataType.JSON).Prefix($"{session}:"),
    schema);

var parameters = FTCreateParams.CreateParams().Filter("@StatusId>1").Prefix($"{session}:");
ft.Create("idx:users2", parameters, schema);

BacktestResultRedis res1 = new BacktestResultRedis
{
    CreatedDay = "2023-10-01",
    StatusId = 1,
    UserId = 1000
};

BacktestResultRedis res2 = new BacktestResultRedis
{
    CreatedDay = "2023-11-01",
    StatusId = 2,
    UserId = 1001
};

BacktestResultRedis res3 = new BacktestResultRedis
{
    CreatedDay = "2023-10-01",
    StatusId = 3,
    UserId = 1002
};

BacktestResultRedis res4 = new BacktestResultRedis
{
    CreatedDay = "2023-12-01",
    StatusId = 2,
    UserId = 1003
};

BacktestResultRedis res5 = new BacktestResultRedis
{
    CreatedDay = "2023-12-11",
    StatusId = 1,
    UserId = 1003
};

json.Set($"{session}:1", "$", res1);
json.Set($"{session}:2", "$", res2);
json.Set($"{session}:3", "$", res3);
json.Set($"{session}:4", "$", res4);
json.Set($"{session}:5", "$", res5);

var request = new AggregationRequest("*", 3).Filter("@StatusId==1")
                .GroupBy("@CreatedDay", Reducers.CountDistinct("@UserId"), Reducers.Count().As("count")) ;
var result = ft.Aggregate("idx:users", request);
    public class BacktestResultRedis
    {
        public string CreatedDay { get; set; }
        public byte StatusId { get; set; }
        public long UserId { get; set; }
    }

I expect the generated query to be

FT.AGGREGATE idx:users * FILTER @StatusId==1 GROUPBY 1 @CreatedDay REDUCE COUNT_DISTINCT 1 @UserId REDUCE COUNT 0 AS count DIALECT 3

which executed in the CLI returns expected results. But the actual query is

FT.AGGREGATE idx:users * GROUPBY 1 @CreatedDay REDUCE COUNT_DISTINCT 1 @UserId REDUCE COUNT 0 AS count FILTER @StatusId==1 DIALECT 3

which executed in the CLI throws an error "Property StatusId not loaded nor in pipeline".

In AggregationRequest class I see that there is a function SerializeRedisArgs which looks like generates the query with a specific order of commands which probably is the issue.

I am doing something wrong or this is a bug?

shacharPash commented 9 months ago

Hey @lem0na, thank for the detailed issue :) After checking with RediSearch team, FILTER or any operation in the pipeline after a GROUPBY can only relate to fields which the GROUPBY will define AS If you relate to other fields you will get an error that the Property is not loaded or not in the pipeline. FILTER before the GROUPBY can relate to any available Property

As you said, I arranged the query so that it is in the order that appears in documentation of FT.AGGREGATE, there the FILTER comes after GROUPBY.

Try to add the @StatusId to the GROUPBY and tell me if you got the result you expected to get. you can do it like this:

var request = new AggregationRequest("*", 3)
    .Filter("@StatusId==1")
    .GroupBy(
        new List<string> { "@CreatedDay", "@StatusId" },
        new List<Reducer> 
        { 
            Reducers.CountDistinct("@UserId"), 
            Reducers.Count().As("count") 
        }
    );

Beyond that, A little thing I noticed is this line:

SearchCommandBuilder.DropIndex("idx:users", dd: true);

Builds the command and returns it to you, but not sending the command to Redis, if you want to sent it to redis you need to do like this:

ISearchCommands ft = db.FT();
ft.DropIndex("idx:users", dd: true);

In any case, don'd do it at the start, because if you send the command when Redis does not know the index yet, you will get an error.

lem0na commented 9 months ago

Hi @shacharPash Thanks for the quick response. I am a newbie using Redis outside the KV store - a few days ago, I discovered that Redis now supports a lot of features that I had thought of. The code that you provided returns the expected results. But I am wondering why TotalResults is 5, not 2 as this is the real number of results. When I execute it in CLI it also returns 5 which is probably related to Redis and not to the library. Why I am asking - as far as I understood I have two options to get the data:

  1. get all results at once which in case I for example have 500,000+ results will require a lot memory usage
  2. read the records one by one using a cursor and calculte the unique number of userId and sum of records with StatusId == 1. But in this case, TotalResults should have the correct value.

About

SearchCommandBuilder.DropIndex("idx:users", dd: true);

it is leftover from some previous tests. Thanks for mentioning it.

shacharPash commented 9 months ago

Hi @lem0na, after researching the topic and reading the documentation, it turns out that the order of the arguments within the FT.AGGREGATE command has meaning, and also that you can send more than one argument. Currently NRedisStack does not allow these two things and I will fix it. Thanks for bringing it to my attention.