redis / redis-om-dotnet

Object mapping, and more, for Redis and .NET
MIT License
441 stars 74 forks source link

Error on insert in RedisCollection #443

Open GraySerg opened 2 months ago

GraySerg commented 2 months ago

On production server frequently occurs that error. Collection contains about 6000 entities. StackExchange.Redis.RedisServerException: EXECABORT Transaction discarded because of previous errors. at Redis.OM.RedisConnection.ExecuteInTransactionAsync(Tuple2[] commandArgsTuples) at Redis.OM.RedisCommands.JsonSetAsync(IRedisConnection connection, String key, String path, String json, TimeSpan timeSpan) at Redis.OM.RedisCommands.JsonSetAsync(IRedisConnection connection, String key, String path, Object obj, TimeSpan timeSpan) at Redis.OM.RedisCommands.SetAsync(IRedisConnection connection, Object obj, TimeSpan timeSpan) at Redis.OM.Searching.RedisCollection1.InsertAsync(T item, TimeSpan timeSpan) No information about "previous errors". Update entites in collection works fine

slorello89 commented 2 months ago

Hi @GraySerg that's a new one here.

First, let's note what this means: EXECABORT Transaction discarded because of previous errors. mean's that the transaction failed. And failed from some error, not an error in the command itself but some other kind of system error.

After a bit of googling, this looks like something that can happen if Redis falls into a bad state (e.g. looks like a really popular one is if the Redis Server runs out of memory).

Can you validate the:

  1. Redis Version
  2. Redis OM Version
  3. Server health (particularly interested to see the state of INFO MEMORY)
GraySerg commented 2 months ago
  1. Redis - 7.2.4
  2. Redis OM Version - 0.6.1
  3. Server health used_memory:23159240 used_memory_human:22.09M used_memory_rss:49270784 used_memory_rss_human:46.99M used_memory_peak:23414168 used_memory_peak_human:22.33M used_memory_peak_perc:98.91% used_memory_overhead:3046900 used_memory_startup:1646856 used_memory_dataset:20112340 used_memory_dataset_perc:93.49% allocator_allocated:23497584 allocator_active:36503552 allocator_resident:39976960 total_system_memory:33654874112 total_system_memory_human:31.34G used_memory_lua:54272 used_memory_vm_eval:54272 used_memory_lua_human:53.00K used_memory_scripts_eval:632 number_of_cached_scripts:1 number_of_functions:0 number_of_libraries:0 used_memory_vm_functions:32768 used_memory_vm_total:87040 used_memory_vm_total_human:85.00K used_memory_functions:184 used_memory_scripts:816 used_memory_scripts_human:816B maxmemory:32212254720 maxmemory_human:30.00G maxmemory_policy:noeviction allocator_frag_ratio:1.55 allocator_frag_bytes:13005968 allocator_rss_ratio:1.10 allocator_rss_bytes:3473408 rss_overhead_ratio:1.23 rss_overhead_bytes:9293824 mem_fragmentation_ratio:2.13 mem_fragmentation_bytes:26132216 mem_not_counted_for_evict:13472 mem_replication_backlog:1048580 mem_total_replication_buffers:1066208 mem_clients_slaves:17632 mem_clients_normal:72192 mem_cluster_links:10720 mem_aof_buffer:0 mem_allocator:jemalloc-5.3.0 active_defrag_running:0 lazyfree_pending_objects:0 lazyfreed_objects:0
slorello89 commented 2 months ago

These stats are from a Redis instance where this failure is actively happening?

The oddity here is that EXECABORT Transaction discarded because of previous errors. means that Redis threw out the entire command out of hand without ever trying to execute, unfortunately this mean's that it discards the actual error's that caused it. There's a few reasons that could happen.

1: There's some major syntactic issue in one of the commands. This seems unlikely given we're doing a JSON set here, and the object is explicitly being serialized to a JSON string. 2: A given command is missing, again this seems unlikely given that you are executing against a Redis Stack Instance, unless you have some non-redis stack instance somewhere in your topography that this could be getting routed to. 3: Some external force (e.g. a memory issue) which Redis would be able to explicitly know about ahead of time. But your memory info says that your system memory is 32 gigs, your max memory setting is 30gigs and your currently used memory is 22mb, so unless this instance is just not representative of your redis deployment then that seems unlikely.

Do you have an exact command/object/timeout this is failing on?

GraySerg commented 2 months ago

Yes, it from production server. Code looks like

            var events = _provider.RedisCollection<EntityToCache>();
            var eventItem = await events.FindByIdAsync(message.Key.ToString());
            if (eventItem == null)
                eventItem = new EntityToCache();
                MapData(eventData, eventItem);
                await events.InsertAsync(eventItem, ttl);//Here exception occured after about 200-400 insert calls
                MapData(eventData, eventItem);
                await events.UpdateAsync(eventItem);//No exceptions here
slorello89 commented 2 months ago

@GraySerg - do you happen to have a copy of the ttl and the eventData that you're trying to insert when this is happening? (if the eventData is sensitive, please sanitize)

GraySerg commented 2 months ago

TLL is 2 days. var ttl = TimeSpan.FromDays(2); Class looks like

    [Document(StorageType = StorageType.Json, Prefixes = new[] { "EventData" })]
    public class EntityToCache
        [RedisIdField, Indexed]
        public int Id { get; set; }

        public bool IsAlive { get; set; }

        public bool IsActive { get; set; }

        public int TopValue { get; set; }

        public DateTime StartTime { get; set; }

        public int[] Visibility { get; set; }

        [Indexed(CascadeDepth = 1)]
        public NameTranslations LeagueNames { get; set; }

        public int TypeId { get; set; }

        public Score Score { get; set; }

        [Indexed(CascadeDepth = 1)]
        public NameTranslations Team1 { get; set; }

        [Indexed(CascadeDepth = 1)]
        public NameTranslations Team2 { get; set; }

        public bool HasCoefs { get; set; }

        public bool IsBetAccepted { get; set; }

        public int MinBetLimit { get; set; }

        public bool IsVisible { get; set; }

        public Dictionary<int, int> Limits { get; set; }
        public int MinCount { get; set; }
        public int MaxCount { get; set; }

    public class NameTranslations
        public NameTranslations()
            Languages = new Dictionary<int, string>(2);
        public Dictionary<int, string> Languages { get; set; }

    public class Score
        [Indexed(CascadeDepth = 1)]
        public ScoreUnit General { get; set; }

        [Indexed(CascadeDepth = 1)]
        public ScoreUnit[] Detailed { get; set; }

        public string DetailedString { get; set; }

        [Indexed(CascadeDepth = 1)]
        public ScoreUnit Points { get; set; }

        public int Innings { get; set; }

        public int Advantage { get; set; }

    public class ScoreUnit
        public int Team1 { get; set; }

        public int Team2 { get; set; }
slorello89 commented 2 months ago

Hi @GraySerg - I am unable to reproduce this error on my end with the similar objects and timeout, with much heavier loads than what you're quoting. Can you able to share a reproduction of this issue outside of your production environment?

IMO the likeliest cause is still environmental (there just isn't any reason the commands you quoted should be failing in this way)

GraySerg commented 2 months ago

It happens only in Redis cluster. If Redis only on one node it works fine.

slorello89 commented 1 month ago

Oh interesting. . . Curious, how many shards are in your Redis Cluster? Is it possible one of those shards doesn't have Redis Stack Running but rather some vanilla version of Redis without the module support? Because a missing command would cause the above error.

GraySerg commented 1 month ago

We have 3 shards by default, and all 3 equals Redis Stack.

granit1986 commented 1 month ago

Hi @slorello89 We found the problem: when we send Json.Set commend - redis cluster returns HashSlot MOVER on each request. It happens because in RedisOM used string key. But if cast string to RedisKey all requests will be executing success without MOVED, because StackExchange.Redis can calculate right HashSlot for RedisKey type. Can you change it behavior in RedisOM library?
