Closed indrekj closed 4 years ago
Seeing:
Assertion failed: data.shm_chid->len >= 1 (/usr/src/nchan-1.1.14/src/store/memory/ipc-handlers.c: memstore_ipc_send_get_message: 528)
There doesn't seem to be anything relevant before that.
Also sometimes I've seen:
Assertion failed: 0 (/usr/src/nchan-1.1.14/src/store/memory/ipc.c: ipc_write_alert_fd: 197) Assertion failed: 0 (/usr/src/nchan-1.1.14/src/store/memory/ipc.c: ipc_write_alert_fd: 197) Assertion failed: spool->msg_status == MSG_CHANNEL_NOTREADY || spool->msg_status == MSG_INVALID (/usr/src/nchan-1.1.14/src/store/spool.c: its_time_for_a_spooling: 1087)
nchan version: 1.1.14 with redis store.
We have nchan deployed to multiple environments and it seems to happen in each environment. Usually every few days or so, sometimes twice a day.
nginx info:
/ # nginx -V nginx version: nginx/1.13.8 built by gcc 6.4.0 (Alpine 6.4.0) built with OpenSSL 1.0.2n 7 Dec 2017 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_geoip_module=dynamic --with-http_perl_module=dynamic --with-threads --with-stream --with-stream_ssl_module --with-http_slice_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_v2_module --with-ipv6 --add-dynamic-module=/usr/src/nchan-1.1.14
nchan configuration:
daemon off; user nginx; worker_processes 3; error_log /dev/stderr info; pid /var/run/nginx.pid; load_module "modules/ngx_nchan_module.so"; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; access_log /dev/stdout; upstream redis_cluster { nchan_redis_server redis://redis-master-svc:6379; } server { listen 80; proxy_read_timeout 1d; nchan_websocket_ping_interval 10s; location /nchan-status { nchan_stub_status; } location /nginx-status { stub_status; } location = /publish { nchan_publisher; nchan_channel_id $arg_channel_id; nchan_message_buffer_length 50; nchan_message_timeout 1h; nchan_redis_pass redis_cluster; } location = / { nchan_pubsub; nchan_subscriber_channel_id $arg_channel_id; nchan_channel_id_split_delimiter ","; nchan_subscriber_last_message_id $arg_last_message_id; nchan_publisher_channel_id ackchannel; nchan_redis_pass redis_cluster; nchan_max_channel_id_length 9434; } } }
So far our only workaround is a e2e check (liveness probe) that restarts the service when publish/subscribe starts to fail.
Please see if you can reproduce this with a build from master. If so, I'd like to see a coredump backtrace. Let me know if this bug persists in master and we'll figure out the rest from there.
Seeing:
There doesn't seem to be anything relevant before that.
Also sometimes I've seen:
nchan version: 1.1.14 with redis store.
We have nchan deployed to multiple environments and it seems to happen in each environment. Usually every few days or so, sometimes twice a day.
nginx info:
nchan configuration:
So far our only workaround is a e2e check (liveness probe) that restarts the service when publish/subscribe starts to fail.