microsoftarchive / redis

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes
http://redis.io
Other
20.85k stars 5.38k forks source link

Slave crashed with a Bug report #299

Closed tellan55 closed 9 years ago

tellan55 commented 9 years ago

Hi Enrico,

I know I read something similar recently however I don't know what is the last answer if the new 2.8.21.02 is the fix or the fix is on hold.

Our configuration is one Master server running Redis and Sentinel, one slave server running Redis and Sentinel and one witness server running only sentinel.

These 3 servers were fresh deployed yesterday, I run some benchmarking after installing all the bits all successful until that point.

Today I found the slave in a error state and cycling and cycling in the below error many times.

Here is the first error report:

== REDIS BUG REPORT START: Cut & paste starting from here === [3828] 26 Aug 02:44:14.203 # --- EXCEPTION_ACCESS_VIOLATION [3828] 26 Aug 02:44:14.203 # --- STACK TRACE redis-server.exe!((null):0)(0x0118E980, 0x0118FF90, 0x00000001, 0x40105AA8) redis-server.exe!((null):0)(0x00000001, 0x00000000, 0x00000001, 0xFD6AC3A0) kernel32.dll!UnhandledExceptionFilter((null):0)(0x0118E980, 0x00000006, 0x00000000, 0x00000001) ntdll.dll!MD5Final((null):0)(0x0118F5C0, 0x40016A36, 0x40108BF8, 0x00000000) ntdll.dll!_C_specific_handler((null):0)(0x01190000, 0x0118FF90, 0x0118FF90, 0x77442DD0) ntdll.dll!RtlDecodePointer((null):0)(0x01190000, 0x771EDDA0, 0x00012F00, 0x00000046) ntdll.dll!RtlUnwindEx((null):0)(0x0118F5C0, 0x0118F0D0, 0x00000000, 0x00000000) ntdll.dll!KiUserExceptionDispatcher((null):0)(0x00000000, 0x00000007, 0xD1E11640, 0xD1223960) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x00000000, 0x00000000, 0x00000000, 0x00000000) redis-server.exe!KiUserExceptionDispatcher((null):0)(0xD121F6D0, 0x00000000, 0x00000000, 0xD1223A50) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x00000000, 0xD1DE8870, 0xD1DE8870, 0x00000008) redis-server.exe!KiUserExceptionDispatcher((null):0)(0xD1DE8870, 0x00000000, 0x00000000, 0x400BEE67) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x00000000, 0x002DFE8F, 0xD1E11640, 0xD24BDD40) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x55DD6036, 0xD1DE8870, 0x0118F880, 0x00000000) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x00000007, 0xD1DE8870, 0x00000008, 0x400705FC) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x55DD6036, 0x40087F60, 0x00000001, 0x00000065) redis-server.exe!KiUserExceptionDispatcher((null):0)(0xD1DE8870, 0x00000065, 0x0000008C, 0x00000001) redis-server.exe!KiUserExceptionDispatcher((null):0)(0xD1DE8870, 0x00000000, 0x0000008C, 0x00000000) redis-server.exe!KiUserExceptionDispatcher((null):0)(0xFFF30CB0, 0x00000001, 0xD1206F10, 0x000000CC) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x55DCD72A, 0xD11B6968, 0x4013417C, 0x001CA6E0) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x00000000, 0x00000004, 0x55DCD72A, 0x00000004) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x00000008, 0xFFFFFFFF, 0xFFFFFFFF, 0x00000000) redis-server.exe!KiUserExceptionDispatcher((null):0)(0x00000000, 0x00000000, 0x00000000, 0x00000000) kernel32.dll!BaseThreadInitThunk((null):0)(0x00000000, 0x00000000, 0x00000000, 0x00000000) ntdll.dll!RtlUserThreadStart((null):0)(0x00000000, 0x00000000, 0x00000000, 0x00000000) ntdll.dll!RtlUserThreadStart((null):0)(0x00000000, 0x00000000, 0x00000000, 0x00000000) [3828] 26 Aug 02:44:14.890 # --- INFO OUTPUT

Server

redis_version:2.8.2101 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:9d5f6488ff5c66dd redis_mode:standalone os:Windows
arch_bits:64 multiplexing_api:winsock_IOCP process_id:3828 run_id:f538ce170674850d251e5dffd4ecb8db336267b0 tcp_port:6379 uptime_in_seconds:35084 uptime_in_days:0 hz:10 lru_clock:14508086 config_file:d:\redis\redis.windows.conf

Clients

connected_clients:6 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0

Memory

used_memory:1142440 used_memory_human:1.09M used_memory_rss:1108648 used_memory_peak:15698192 used_memory_peak_human:14.97M used_memory_lua:36864 mem_fragmentation_ratio:0.97 mem_allocator:dlmalloc-2.8

Persistence

loading:0 rdb_changes_since_last_save:3014287 rdb_bgsave_in_progress:0 rdb_last_save_time:1440536362 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok

Stats

total_connections_received:3018 total_commands_processed:4239061 instantaneous_ops_per_sec:5 total_net_input_bytes:452450492 total_net_output_bytes:0 instantaneous_input_kbps:0.28 instantaneous_output_kbps:0.00 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 evicted_keys:0 keyspace_hits:0 keyspace_misses:2011896 pubsub_channels:1 pubsub_patterns:0 latest_fork_usec:0

Replication

role:master connected_slaves:0 master_repl_offset:111294655 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0

CPU

used_cpu_sys:9.25 used_cpu_user:13.13 used_cpu_sys_children:0.00 used_cpu_user_children:0.00

Commandstats

cmdstat_get:calls=402372,usec=46875,usec_per_call=0.12 cmdstat_set:calls=201192,usec=109375,usec_per_call=0.54 cmdstat_del:calls=3,usec=31250,usec_per_call=10416.67 cmdstat_incr:calls=201192,usec=62500,usec_per_call=0.31 cmdstat_lpush:calls=402384,usec=125000,usec_per_call=0.31 cmdstat_lpop:calls=201192,usec=93750,usec_per_call=0.47 cmdstat_lrange:calls=1609524,usec=515625,usec_per_call=0.32 cmdstat_sadd:calls=2,usec=0,usec_per_call=0.00 cmdstat_srem:calls=2,usec=0,usec_per_call=0.00 cmdstat_mset:calls=200832,usec=468750,usec_per_call=2.33 cmdstat_select:calls=7,usec=0,usec_per_call=0.00 cmdstat_keys:calls=6,usec=0,usec_per_call=0.00 cmdstat_auth:calls=3018,usec=31250,usec_per_call=10.35 cmdstat_ping:calls=906930,usec=46875,usec_per_call=0.05 cmdstat_multi:calls=1,usec=0,usec_per_call=0.00 cmdstat_info:calls=10351,usec=0,usec_per_call=0.00 cmdstat_config:calls=7,usec=0,usec_per_call=0.00 cmdstat_subscribe:calls=3,usec=0,usec_per_call=0.00 cmdstat_publish:calls=100030,usec=31250,usec_per_call=0.31 cmdstat_client:calls=13,usec=0,usec_per_call=0.00

Keyspace

[3828] 26 Aug 02:44:14.890 # === REDIS BUG REPORT END. Make sure to include from START to END. ===

The STDOUT file is 6.5 GB and I saved, let me know if you need any extra information from it.

Thanks in advance.....

enricogior commented 9 years ago

Hi @tellan55 your bug report looks like this one https://github.com/MSOpenTech/redis/issues/293 Can you please post a few log lines before the crash (20 lines should be enough). Thank you.

tellan55 commented 9 years ago

Here are the extra lines...

[3828] 26 Aug 02:41:29.406 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:41:34.875 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:41:40.343 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:41:45.812 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:41:51.281 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:41:56.750 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:42:02.218 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:07.687 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:13.156 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:18.625 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:24.093 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:42:29.562 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:42:35.031 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:40.500 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:45.968 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:42:51.437 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:56.906 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:02.375 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:43:07.843 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:13.312 - 7 clients connected (0 slaves), 1141416 bytes in use [3828] 26 Aug 02:43:18.781 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:24.250 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:29.718 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:35.187 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:40.656 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:46.125 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:51.593 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:43:57.062 - 7 clients connected (0 slaves), 1141416 bytes in use [3828] 26 Aug 02:44:02.531 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:44:06.406 # Connection with master lost. [3828] 26 Aug 02:44:14.203 * Caching the disconnected master state. [3828] 26 Aug 02:44:14.203 * Discarding previously cached master state. [3828] 26 Aug 02:44:14.203 #

=== REDIS BUG REPORT START: Cut & paste starting from here === [3828] 26 Aug 02:44:14.203 # --- EXCEPTION_ACCESS_VIOLATION [3828] 26 Aug 02:44:14.203 # --- STACK TRACE redis-server.exe!((null):0)(0x0118E980, 0x0118FF90, 0x00000001, 0x40105AA8)

Thanks,

On Aug 26, 2015, at 1:11 PM, Enrico Giordani notifications@github.com wrote:

Hi @tellan55 your bug report looks like this one #293 Can you please post a few log lines before the crash (20 lines should be enough). Thank you.

— Reply to this email directly or view it on GitHub.

enricogior commented 9 years ago

@tellan55 can you please confirm that the redis that crashed is the slave? I'm asking because the error report shows "role:master" and that is not consistent with the log message "Connection with master lost." Can you please check the master log and see if there is a failure around 02:44? If the slave lost the connection to the master, I would expect that the master crashed as well. Thanks.

tellan55 commented 9 years ago

@Enrico:

Yes, confirming what crashed was the slave

I run also the memory test and just finished with no errors.

Here are all the logs, STDOUT from the Master and Sentinel from the Master and Slave. The third server (10.115.69.78) in the Sentinel group is just a witness for the quorum.

STDOUT Master at that time:

[4092] 26 Aug 02:43:51.674 - 6 clients connected (1 slaves), 2409952 bytes in use [4092] 26 Aug 02:43:57.142 - 6 clients connected (1 slaves), 2409952 bytes in use [4092] 26 Aug 02:44:02.611 - 6 clients connected (1 slaves), 2377184 bytes in use [4092] 26 Aug 02:44:14.345 - Reading from client: An existing connection was forcibly closed by the remote host. [4092] 26 Aug 02:44:14.345 - Reading from client: An existing connection was forcibly closed by the remote host. [4092] 26 Aug 02:44:14.345 # Connection with slave 10.115.69.77:6379 lost. [4092] 26 Aug 02:44:14.345 - Reading from client: An existing connection was forcibly closed by the remote host. [4092] 26 Aug 02:44:14.345 - Reading from client: An existing connection was forcibly closed by the remote host. [4092] 26 Aug 02:44:14.345 - Accepted 10.115.69.76:10869 [4092] 26 Aug 02:44:14.345 - Reading from client: An existing connection was forcibly closed by the remote host. [4092] 26 Aug 02:44:14.345 - Reading from client: An existing connection was forcibly closed by the remote host. [4092] 26 Aug 02:44:14.345 - Accepted 10.115.69.77:12343 [4092] 26 Aug 02:44:14.345 - Accepted 10.115.69.78:9213 [4092] 26 Aug 02:44:14.345 - Accepted 10.115.69.77:12344 [4092] 26 Aug 02:44:14.345 - Accepted 10.115.69.78:9214 [4092] 26 Aug 02:44:19.814 - 6 clients connected (0 slaves), 2359800 bytes in use [4092] 26 Aug 02:44:25.283 - 6 clients connected (0 slaves), 2392568 bytes in use [4092] 26 Aug 02:44:30.752 - 6 clients connected (0 slaves), 2359800 bytes in use [4092] 26 Aug 02:44:36.220 - 6 clients connected (0 slaves), 2359800 bytes in use [4092] 26 Aug 02:44:41.689 - 6 clients connected (0 slaves), 2359800 bytes in use

Sentinel log Master:

[2980] 25 Aug 17:01:22.424 # Sentinel runid is f8c57ff9b8de9399cd3db0ccc7efd9dfd9b3a1e4 [2980] 25 Aug 17:01:22.424 # +monitor master APICloud 10.115.69.76 6379 quorum 2 [2980] 25 Aug 17:01:22.439 * +slave slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [2980] 25 Aug 17:02:12.080 * +sentinel sentinel 10.115.69.77:10001 10.115.69.77 10001 @ APICloud 10.115.69.76 6379 [2980] 25 Aug 17:17:31.736 * +sentinel sentinel 10.115.69.78:10020 10.115.69.78 10020 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 02:44:05.658 # +sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 02:44:06.267 # +new-epoch 1 [2980] 26 Aug 02:44:06.267 # +vote-for-leader 6385d37a8cce0dad8369ddffdd8762bb4ea65136 1 [2980] 26 Aug 02:44:06.736 # +odown master APICloud 10.115.69.76 6379 #quorum 3/2 [2980] 26 Aug 02:44:14.720 # Next failover delay: I will not start a failover before Wed Aug 26 02:45:06 2015 [2980] 26 Aug 02:44:14.955 # +tilt #tilt mode entered [2980] 26 Aug 02:44:44.986 # -tilt #tilt mode exited [2980] 26 Aug 02:44:44.986 # -sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 02:44:44.986 # -odown master APICloud 10.115.69.76 6379 [2980] 26 Aug 03:43:00.361 # +sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 03:43:00.424 # +odown master APICloud 10.115.69.76 6379 #quorum 2/2 [2980] 26 Aug 03:43:00.424 # +new-epoch 2 [2980] 26 Aug 03:43:00.424 # +try-failover master APICloud 10.115.69.76 6379 [2980] 26 Aug 03:43:00.424 # +vote-for-leader f8c57ff9b8de9399cd3db0ccc7efd9dfd9b3a1e4 2 [2980] 26 Aug 03:43:00.424 # 10.115.69.77:10001 voted for f8c57ff9b8de9399cd3db0ccc7efd9dfd9b3a1e4 2 [2980] 26 Aug 03:43:00.502 # +elected-leader master APICloud 10.115.69.76 6379 [2980] 26 Aug 03:43:00.502 # +failover-state-select-slave master APICloud 10.115.69.76 6379 [2980] 26 Aug 03:43:00.564 # -failover-abort-no-good-slave master APICloud 10.115.69.76 6379 [2980] 26 Aug 03:43:24.424 # +sdown sentinel 10.115.69.78:10020 10.115.69.78 10020 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 03:43:24.502 # +tilt #tilt mode entered [2980] 26 Aug 03:43:54.549 # -tilt #tilt mode exited [2980] 26 Aug 03:43:54.549 # -sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 03:43:54.549 # -odown master APICloud 10.115.69.76 6379 [2980] 26 Aug 03:43:54.549 # -sdown sentinel 10.115.69.78:10020 10.115.69.78 10020 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 05:44:00.689 # +sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 05:44:01.424 # +new-epoch 3 [2980] 26 Aug 05:44:01.424 # +vote-for-leader 008023d4e341763a070f4cb6e839607c6b5c6ae5 3 [2980] 26 Aug 05:44:01.549 # -sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:08:48.174 # +sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:08:48.252 # +odown master APICloud 10.115.69.76 6379 #quorum 2/2 [2980] 26 Aug 06:08:48.252 # +new-epoch 4 [2980] 26 Aug 06:08:48.252 # +try-failover master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:08:50.455 # +vote-for-leader f8c57ff9b8de9399cd3db0ccc7efd9dfd9b3a1e4 4 [2980] 26 Aug 06:08:50.720 # 10.115.69.78:10020 voted for 008023d4e341763a070f4cb6e839607c6b5c6ae5 4 [2980] 26 Aug 06:08:50.799 # +tilt #tilt mode entered [2980] 26 Aug 06:08:50.830 # 10.115.69.77:10001 voted for 008023d4e341763a070f4cb6e839607c6b5c6ae5 4 [2980] 26 Aug 06:09:20.892 # -tilt #tilt mode exited [2980] 26 Aug 06:09:20.892 # -sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:09:20.892 # -odown master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:09:20.892 # -failover-abort-not-elected master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:45:58.455 # +sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:45:58.564 # +odown master APICloud 10.115.69.76 6379 #quorum 2/2 [2980] 26 Aug 06:45:58.564 # +new-epoch 5 [2980] 26 Aug 06:45:58.564 # +try-failover master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:46:01.611 # +vote-for-leader f8c57ff9b8de9399cd3db0ccc7efd9dfd9b3a1e4 5 [2980] 26 Aug 06:46:01.674 # 10.115.69.78:10020 voted for f8c57ff9b8de9399cd3db0ccc7efd9dfd9b3a1e4 5 [2980] 26 Aug 06:46:01.705 # 10.115.69.77:10001 voted for 6385d37a8cce0dad8369ddffdd8762bb4ea65136 5 [2980] 26 Aug 06:46:01.752 # +tilt #tilt mode entered [2980] 26 Aug 06:46:31.799 # -tilt #tilt mode exited [2980] 26 Aug 06:46:31.799 # -sdown master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:46:31.799 # -odown master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:46:31.799 # +elected-leader master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:46:31.799 # +failover-state-select-slave master APICloud 10.115.69.76 6379 [2980] 26 Aug 06:46:31.861 # -failover-abort-no-good-slave master APICloud 10.115.69.76 6379 [2980] 26 Aug 07:05:39.824 # +tilt #tilt mode entered [2980] 26 Aug 07:06:09.902 # -tilt #tilt mode exited [2980] 26 Aug 07:20:31.480 # +sdown sentinel 10.115.69.78:10020 10.115.69.78 10020 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 07:20:33.074 # -sdown sentinel 10.115.69.78:10020 10.115.69.78 10020 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 11:25:49.715 # +sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 11:33:20.230 * +reboot slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 11:33:20.324 # -sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 11:36:32.996 # +sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 12:05:59.918 * +reboot slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [2980] 26 Aug 12:05:59.996 # -sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379

Sentinel log Slave:

           _._                                                 
       _.-``__ ''-._                                             
  _.-``    `.  `_.  ''-._           Redis 2.8.2101 (00000000/0) 64 bit

.-.-``.``\/ _.,_ ''-._ ( ' , .- | , ) Running in sentinel mode |-._-...- ...-.-._|'.-'| Port: 10001 |`-. ._ / _.-' | PID: 1492 -. `-.-./ _.-' _.-' |-.`-.-.**.-' _.-'_.-'| |-.`-. .-'.-' | http://redis.io
-._-.`-.**.-'.-' .-'
|`-.
-._-..-' .-'.-'|
| -._-. .-'.-' |
`-.
-._-.
.-'.-' .-'
-._-.
.-' .-'
`-.
_.-'
`-.__.-'

[1492] 25 Aug 17:02:09.828 # Sentinel runid is 6385d37a8cce0dad8369ddffdd8762bb4ea65136 [1492] 25 Aug 17:02:09.828 # +monitor master APICloud 10.115.69.76 6379 quorum 2 [1492] 25 Aug 17:02:10.828 * +slave slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 25 Aug 17:02:11.640 * +sentinel sentinel 10.115.69.76:10000 10.115.69.76 10000 @ APICloud 10.115.69.76 6379 [1492] 25 Aug 17:17:31.656 * +sentinel sentinel 10.115.69.78:10020 10.115.69.78 10020 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 02:29:16.250 # +sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 02:29:16.250 # +sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 02:29:16.343 # -sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 02:29:16.625 # -sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:06.109 # +sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:06.187 # +odown master APICloud 10.115.69.76 6379 #quorum 2/2 [1492] 26 Aug 02:44:06.187 # +new-epoch 1 [1492] 26 Aug 02:44:06.187 # +try-failover master APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:06.187 # +vote-for-leader 6385d37a8cce0dad8369ddffdd8762bb4ea65136 1 [1492] 26 Aug 02:44:06.187 # 10.115.69.76:10000 voted for 6385d37a8cce0dad8369ddffdd8762bb4ea65136 1 [1492] 26 Aug 02:44:06.187 # 10.115.69.78:10020 voted for 6385d37a8cce0dad8369ddffdd8762bb4ea65136 1 [1492] 26 Aug 02:44:06.265 # +elected-leader master APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:06.265 # +failover-state-select-slave master APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:06.343 # +selected-slave slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:06.343 * +failover-state-send-slaveof-noone slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:06.406 * +failover-state-wait-promotion slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:10.265 # +sdown sentinel 10.115.69.76:10000 10.115.69.76 10000 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:14.718 # +tilt #tilt mode entered [1492] 26 Aug 02:44:44.718 # -tilt #tilt mode exited [1492] 26 Aug 02:44:44.718 # -sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:44.718 # -odown master APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:44.718 # -failover-abort-slave-timeout master APICloud 10.115.69.76 6379 [1492] 26 Aug 02:44:44.718 # -sdown sentinel 10.115.69.76:10000 10.115.69.76 10000 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 02:56:30.162 # +tilt #tilt mode entered [1492] 26 Aug 02:57:00.178 # -tilt #tilt mode exited [1492] 26 Aug 03:43:01.912 # +new-epoch 2 [1492] 26 Aug 03:43:01.912 # +vote-for-leader f8c57ff9b8de9399cd3db0ccc7efd9dfd9b3a1e4 2 [1492] 26 Aug 03:43:02.162 # +sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 03:43:05.303 # +sdown sentinel 10.115.69.76:10000 10.115.69.76 10000 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 03:43:25.897 # +sdown sentinel 10.115.69.78:10020 10.115.69.78 10020 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 03:43:25.975 # +tilt #tilt mode entered [1492] 26 Aug 03:43:56.006 # -tilt #tilt mode exited [1492] 26 Aug 03:43:56.006 # -sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 03:43:56.006 # -sdown sentinel 10.115.69.76:10000 10.115.69.76 10000 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 03:43:56.006 # -sdown sentinel 10.115.69.78:10020 10.115.69.78 10020 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 05:44:02.256 # +new-epoch 3 [1492] 26 Aug 05:44:02.256 # +vote-for-leader 008023d4e341763a070f4cb6e839607c6b5c6ae5 3 [1492] 26 Aug 05:44:02.272 # +sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 05:44:02.381 # -sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 06:08:48.897 # +sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 06:08:49.209 # +new-epoch 4 [1492] 26 Aug 06:08:49.209 # +vote-for-leader 008023d4e341763a070f4cb6e839607c6b5c6ae5 4 [1492] 26 Aug 06:08:50.037 # +odown master APICloud 10.115.69.76 6379 #quorum 2/2 [1492] 26 Aug 06:08:51.647 # Next failover delay: I will not start a failover before Wed Aug 26 06:09:49 2015 [1492] 26 Aug 06:08:51.725 # -sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 06:08:51.725 # -odown master APICloud 10.115.69.76 6379 [1492] 26 Aug 06:45:59.334 # +sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 06:45:59.397 # +odown master APICloud 10.115.69.76 6379 #quorum 3/2 [1492] 26 Aug 06:45:59.397 # +new-epoch 5 [1492] 26 Aug 06:45:59.397 # +try-failover master APICloud 10.115.69.76 6379 [1492] 26 Aug 06:46:02.116 # +vote-for-leader 6385d37a8cce0dad8369ddffdd8762bb4ea65136 5 [1492] 26 Aug 06:46:02.491 # 10.115.69.76:10000 voted for f8c57ff9b8de9399cd3db0ccc7efd9dfd9b3a1e4 5 [1492] 26 Aug 06:46:02.491 # 10.115.69.78:10020 voted for f8c57ff9b8de9399cd3db0ccc7efd9dfd9b3a1e4 5 [1492] 26 Aug 06:46:02.584 # +tilt #tilt mode entered [1492] 26 Aug 06:46:32.647 # -tilt #tilt mode exited [1492] 26 Aug 06:46:32.647 # -sdown master APICloud 10.115.69.76 6379 [1492] 26 Aug 06:46:32.647 # -odown master APICloud 10.115.69.76 6379 [1492] 26 Aug 06:46:32.647 # -failover-abort-not-elected master APICloud 10.115.69.76 6379 [1492] 26 Aug 07:05:38.893 # +tilt #tilt mode entered [1492] 26 Aug 07:06:08.940 # -tilt #tilt mode exited [1492] 26 Aug 11:25:49.596 # +sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 11:33:20.284 * +reboot slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 11:33:20.393 # -sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 11:36:32.753 # +sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 12:05:59.565 * +reboot slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379 [1492] 26 Aug 12:05:59.628 # -sdown slave 10.115.69.77:6379 10.115.69.77 6379 @ APICloud 10.115.69.76 6379

Thanks,

[3828] 26 Aug 02:41:29.406 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:41:34.875 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:41:40.343 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:41:45.812 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:41:51.281 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:41:56.750 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:42:02.218 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:07.687 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:13.156 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:18.625 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:24.093 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:42:29.562 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:42:35.031 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:40.500 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:45.968 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:42:51.437 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:42:56.906 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:02.375 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:43:07.843 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:13.312 - 7 clients connected (0 slaves), 1141416 bytes in use [3828] 26 Aug 02:43:18.781 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:24.250 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:29.718 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:35.187 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:40.656 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:46.125 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:43:51.593 - 7 clients connected (0 slaves), 1108648 bytes in use [3828] 26 Aug 02:43:57.062 - 7 clients connected (0 slaves), 1141416 bytes in use [3828] 26 Aug 02:44:02.531 - 7 clients connected (0 slaves), 1075880 bytes in use [3828] 26 Aug 02:44:06.406 # Connection with master lost. [3828] 26 Aug 02:44:14.203 * Caching the disconnected master state. [3828] 26 Aug 02:44:14.203 * Discarding previously cached master state. [3828] 26 Aug 02:44:14.203 #

=== REDIS BUG REPORT START: Cut & paste starting from here === [3828] 26 Aug 02:44:14.203 # --- EXCEPTION_ACCESS_VIOLATION [3828] 26 Aug 02:44:14.203 # --- STACK TRACE redis-server.exe!((null):0)(0x0118E980, 0x0118FF90, 0x00000001, 0x40105AA8)

Thanks,


On Aug 26, 2015, at 2:01 PM, Enrico Giordani notifications@github.com wrote:

@tellan55 can you please confirm that the redis that crashed is the slave? I'm asking because the error report shows "role:master" and that is not consistent with the log message "Connection with master lost." Can you please check the master log and see if there is a failure around 02:44? If the slave lost the connection to the master, I would expect that the master crashed as well. Thanks.

— Reply to this email directly or view it on GitHub.

enricogior commented 9 years ago

@tellan55 the logs show that both the slave and the sentinel running on 10.115.69.77 lost the connection to the master. Right after loosing the connection, the slave and the sentinel on 10.115.69.77 tried to reconnect but the connections were lost again. It's really strange that connections are "forcibly closed by the remote host" when running on a local network, it may be something external to redis that is causing it. Do you have any antivirus or internet security software running on the servers?

tellan55 commented 9 years ago

@enricogiore

Yeah, we have antivirus in all our servers. What files should I put in the exception list?

On Aug 26, 2015, at 3:21 PM, Enrico Giordani notifications@github.com wrote:

@tellan55 the logs show that both the slave and the sentinel running on 10.115.69.77 lost the connection to the master. Right after loosing the connection, the slave and the sentinel on 10.115.69.77 tried to reconnect but the connections were lost again. It's really strange that connections are "forcibly closed by the remote host" when running on a local network, it may be something external to redis that is causing it. Do you have any antivirus or internet security software running on the servers?

— Reply to this email directly or view it on GitHub.

enricogior commented 9 years ago

@tellan55 I can't be sure it's the antivirus fault, but it's a possibility since the crash log doesn't show any application error. You should exclude redis-server.exe and the folder that contains the configuration files, the heapdir that is the folder where the memory map file RedisQFork_####.dat is saved and the dir folder that is where the rdb file is saved.

tellan55 commented 9 years ago

@Enricogior

We aren't doing any persistence, so no heap folder or RDB. I will exclude the other files in the antivirus and report back.

Thanks for you promptly

Sent from my iPhone

On Aug 26, 2015, at 4:37 PM, Enrico Giordani notifications@github.com wrote:

@tellan55 I can't be sure it's the antivirus fault, but it's a possibility since the crash log doesn't show any application error. You should exclude redis-server.exe and the folder that contains the configuration files, the heapdir that is the folder where the memory map file RedisQFork_####.dat is saved and the dir folder that is where the rdb file is saved.

— Reply to this email directly or view it on GitHub.

tellan55 commented 9 years ago

@Enricogior

Good morning,

This is the follow up of #299.

We excluded from the Anti-Virus all the folders that contains Redis files, executable or logs from the Antivirus since 08/25 night. The Slave server started that memory report cycle again yesterday morning and also today. The behavior is exactly the same a memory error reported after that crash that whose over and over. However the Windows Services didn't stop or crash.

Master server instance never crash.

After the slave start that memory report cycle the Instance start having performance issue in the Master. The only way to bring the Redis Slave back is stopping the Windows Services, erase the STDOUT, and start the services.

I have 13 MB of zipped files, that contains:

redis.windows.conf (Master) redis.windows.conf (Slave) STDOUT(Master) STDOUT(Slave)(Lattes from yesterday)

I can't attach it here for some reason, probably because if the size.

Let me know what information you would like me to send over for analysis.

Thanks

On Wed, Aug 26, 2015 at 4:53 PM, Raul tellan55@gmail.com wrote:

@Enricogior

We aren't doing any persistence, so no heap folder or RDB. I will exclude the other files in the antivirus and report back.

Thanks for you promptly

Sent from my iPhone

On Aug 26, 2015, at 4:37 PM, Enrico Giordani notifications@github.com wrote:

@tellan55 https://github.com/tellan55 I can't be sure it's the antivirus fault, but it's a possibility since the crash log doesn't show any application error. You should exclude redis-server.exe and the folder that contains the configuration files, the heapdir that is the folder where the memory map file RedisQFork_####.dat is saved and the dir folder that is where the rdb file is saved.

— Reply to this email directly or view it on GitHub https://github.com/MSOpenTech/redis/issues/299#issuecomment-135163011.

enricogior commented 9 years ago

Hi @tellan55 please send the files to --edited-- using the subject "Issue #299". Can you please add the sentinels config files as well? Thank you.

tellan55 commented 9 years ago

Hi @Enricogior

Files have been sent to the above email address in a zip file

Thanks

On Fri, Aug 28, 2015 at 8:28 AM, Enrico Giordani notifications@github.com wrote:

Hi @tellan55 https://github.com/tellan55 please send the files to winredis-bugs@microsoft.com using the subject "Issue #299 https://github.com/MSOpenTech/redis/issues/299". Can you please add the sentinels config files as well? Thank you.

— Reply to this email directly or view it on GitHub https://github.com/MSOpenTech/redis/issues/299#issuecomment-135762278.

enricogior commented 9 years ago

@tellan55 thanks for the files, I'm investigating the issue and I'll let you know if I find what is causing the crash. In the meantime would you consider running a private build with some extra debug information? In case let me know and I'll send you the bits. Thank you.

tellan55 commented 9 years ago

@Enricogior

Yes please send it and I will install it.

Thanks

On Aug 28, 2015, at 11:23 AM, Enrico Giordani notifications@github.com wrote:

@tellan55 thanks for the files, I'm investigating the issue and I'll let you know if I find what is causing the crash. In the meantime would you consider running a private build with some extra debug information? In case let me know and I'll send you the bits. Thank you.

— Reply to this email directly or view it on GitHub.

tellan55 commented 9 years ago

Closing this issue, the detection time for a Master to Slave migration in my case can't be less that 15 sec and it is due to a network issue we are investigating and the time among the Redis Sentiles that bring several of them to TILT mode.

Thanks