Closed rmihael closed 8 years ago
@rmihael this message is not related with shared memory. It is an allocation on normal memory. How is your server memory? Are you having swap? How big is the message being posted? Try to replace the line ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "push stream module: unable to allocate memory for payload"); by ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "push stream module: unable to allocate memory for payload for %d bytes", ctx->frame->payload_len);
to help investigate the message size.
Server memory seems to be in a good shape:
# free -m
total used free shared buffers cached
Mem: 1655 1470 184 0 329 566
-/+ buffers/cache: 573 1081
Swap: 1919 6 1913
I don't think that we are sending any messages that are bigger then 20 KB. I'll recompile nginx with a change you proposed and get back with accurate measurements.
Actually I think that it may be useful to have a full payload that failed allocation. Can you suggest a log line that will dump it? Sorry for asking, but it's been a long time since I last used C for anything.
At this time the module does not have the payload yet. It is allocating memory to read it from socket.
That's what I got in logs after changing line by your suggestion:
2015/12/16 10:56:14 [error] 5312#0: *2450 push stream module: unable to allocate memory for payload for -2146963871 bytes
It's quite strange, I don't think that we have any message of a size that exceeds signed integer range.
Here are more logs for a last few hours:
2015/12/16 10:56:14 [error] 5312#0: *2450 push stream module: unable to allocate memory for payload for -2146963871 bytes
2015/12/16 11:01:17 [error] 5310#0: *3495 push stream module: unable to allocate memory for payload for 201314684 bytes
2015/12/16 11:25:07 [error] 5310#0: *6125 push stream module: unable to allocate memory for payload for -2145296493 bytes
2015/12/16 11:28:00 [error] 5311#0: *6407 push stream module: unable to allocate memory for payload for 1653362954 bytes
2015/12/16 11:37:25 [error] 5310#0: *7405 push stream module: unable to allocate memory for payload for -2090981240 bytes
2015/12/16 11:37:39 [error] 5313#0: *6489 push stream module: unable to allocate memory for payload for -1628155000 bytes
2015/12/16 12:02:50 [error] 5310#0: *9403 push stream module: unable to allocate memory for payload for 1773878779 bytes
2015/12/16 12:17:30 [error] 5311#0: *11165 push stream module: unable to allocate memory for payload for 839370131 bytes
For completeness sake I'll add logging for payload size on our application side.
Profiled payload sizes on application side. There are some bigger messages sized ~64KB, but their count is less then 10 per hour. Absolute majority of messages are <1KB in size. Nothing that scales up to megabytes.
@rmihael I'm guessing that this is related with partial frames that websocket RFC describes but is not supported on the module yet. I will prepare some log messages and ask you to add at your server to debug. Can you do that for me?
Definitely. Looking forward for these messages.
@rmihael can you try to use the code on payload_check branch? It will log the received size and the one it choose to use.
@rmihael Did you have opportunity to test this?
We deployed payload_check branch today, but haven't seen relevant logs yet. I'll update this issue as soon as we'll have something.
After deploying payload_check
branch we had zero allocation errors for a full three days. Usually we have ~10 per day. Either something in the branch fixed this problem or something broke error logging.
Great.
Just to be sure. Are you using the 876508f83676122ffbd4253ae11ae8ef8968601b, right? If yes, "rollback" to baa1e9adda4c3114838369375c9416b02f5e4aa7. If the errors returns, means that the it was fixed by the "fragmented frames" support added on the last commit.
Strange, we used baa1e9a, but still see no allocation errors. There are some errors, but seems that they are irrelevant:
2016/01/11 17:18:43 [error] 1112#0: *3304 push stream module: Don't have at least one of the mandatory headers: Connection, Upgrade, Sec-WebSocket-Key and Sec-WebSocket-Version, client: 158.96.4.13, server: , request: "GET /push/ws/c50a921c032cc4e4b8c92f9d0/c50a921c032cc4e4b8c92f9d0u562f8f297443765254f5dad1?_=1452532713640&tag=&time=&eventid= HTTP/1.1", host: "push.nimble.com"
2016/01/11 23:56:40 [error] 1110#0: *12460 push stream module: Don't have at least one of the mandatory headers: Connection, Upgrade, Sec-WebSocket-Key and Sec-WebSocket-Version, client: 158.96.4.13, server: , request: "GET /push/ws/c50a921c032cc4e4b8c92f9d0/c50a921c032cc4e4b8c92f9d0u562f8f297443765254f5dad1?_=1452556599993&tag=&time=&eventid= HTTP/1.1", host: "push.nimble.com"
2016/01/12 14:39:49 [error] 1111#0: *28350 push stream module: Don't have at least one of the mandatory headers: Connection, Upgrade, Sec-WebSocket-Key and Sec-WebSocket-Version, client: 162.218.214.1, server: , request: "GET /push/ws/c565862ab7eb0256e34a5c58e/c565862ab7eb0256e34a5c58eu568d65282007d00bc67aaf4d?_=1452609588384&tag=&time=&eventid= HTTP/1.1", host: "push.nimble.com"
2016/01/12 14:47:34 [error] 1113#0: *28382 push stream module: Don't have at least one of the mandatory headers: Connection, Upgrade, Sec-WebSocket-Key and Sec-WebSocket-Version, client: 162.218.214.1, server: , request: "GET /push/ws/c565862ab7eb0256e34a5c58e/c565862ab7eb0256e34a5c58eu568d65282007d00bc67aaf4d?_=1452610053759&tag=&time=&eventid= HTTP/1.1", host: "push.nimble.com"
2016/01/12 16:29:03 [error] 1113#0: *31824 push stream module: Don't have at least one of the mandatory headers: Connection, Upgrade, Sec-WebSocket-Key and Sec-WebSocket-Version, client: 158.96.4.13, server: , request: "GET /push/ws/c50a921c032cc4e4b8c92f9d0/c50a921c032cc4e4b8c92f9d0u562f8f297443765254f5dad1?_=1452616142105&tag=&time=&eventid= HTTP/1.1", host: "push.nimble.com"
2016/01/12 16:30:14 [error] 1113#0: *31861 push stream module: Don't have at least one of the mandatory headers: Connection, Upgrade, Sec-WebSocket-Key and Sec-WebSocket-Version, client: 158.96.4.13, server: , request: "GET /push/ws/c50a921c032cc4e4b8c92f9d0/c50a921c032cc4e4b8c92f9d0u562f8f297443765254f5dad1?_=1452616211531&tag=&time=&eventid= HTTP/1.1", host: "push.nimble.com"
baa1e9a continues to work without unable to allocate memory for payload
errors, but today we got a bunch of errors related to a shared memory. They persisted until we restarted nginx process.
2016/01/19 23:04:01 [crit] 1112#0: ngx_slab_alloc() failed: no memory
2016/01/19 23:04:01 [error] 1112#0: *216475 push stream module: unable to allocate message in shared memory, client: 111.222.113.114, server: , request: "POST /push/pub?id=c4d1be88768aa51a15f1beb0eu5386568bf93a5f2f261efb93 HTTP/1.1", host: "push.host.com"
Hi @rmihael the lack of memory can be that you set a small memory to the needs of your application, or due to a misinterpretation of the payload size that result in a big number, but non-negative. We can set to always log the payload size and check if all of it are in the expected size. If yes, may be necessary to adjust your configurations. If not, may be that just moving back to the 876508f really solve the wrong parser of payload size. What do you think? Can we do this test?
During past few days we saw the following errors:
2016/01/21 18:42:31 [crit] 20706#0: ngx_slab_alloc() failed: no memory
2016/01/21 18:42:31 [alert] 20706#0: ngx_slab_free(): pointer to wrong chunk
2016/01/21 18:42:31 [error] 20706#0: *51819 push stream module: unable to allocate message in shared memory, client: 10.83.10.24, server: , request: "POST /push/pub?id=c4c2118ad5
4397f271b000000u4c21187154397f270a000001 HTTP/1.1", host: "host.com"
2016/01/23 00:15:50 [crit] 20707#0: ngx_slab_alloc() failed: no memory
2016/01/23 00:15:50 [alert] 20707#0: ngx_slab_free(): outside of pool
2016/01/23 00:15:50 [alert] 20707#0: ngx_slab_free(): pointer to wrong chunk
2016/01/23 00:15:50 [alert] 20707#0: ngx_slab_free(): pointer to wrong page
2016/01/23 00:15:50 [error] 20707#0: *91021 push stream module: unable to allocate message in shared memory, client: 10.30.9.176, server: , request: "POST /push/pub?id=c4c2118ad54397f271b000000u4c21187154397f270a000001 HTTP/1.1", host: "host.com"
2016/01/27 13:46:04 [crit] 20706#0: ngx_slab_alloc() failed: no memory
2016/01/27 13:46:04 [error] 20706#0: *200800 push stream module: unable to allocate message in shared memory, client: 10.204.142.145, server: , request: "POST /push/pub?id=c4c65bc2106d795162e000038u4c65bc2106d795162e000036 HTTP/1.1", host: "host.com"
2016/01/27 13:46:04 [error] 20706#0: *200801 push stream module: unable to allocate message in shared memory, client: 10.204.142.145, server: , request: "POST /push/pub?id=c4c65bc2106d795162e000038u4c65bc2106d795162e000036 HTTP/1.1", host: "host.com"
2016/01/27 13:47:32 [alert] 20707#0: ngx_slab_free(): page is already free
That's happening on baa1e9a.
I would be happy to track it down. We can deploy version with more verbose logging if you provide one.
Hi @rmihael can you test the commit 8d70918c14a8a74fcebe0f21b3a7019aff905ade ? I fixed a potencial error point during message template apply process. And added some log messages to help us investigate the problem. Which value have you set for shared message size? My guess is that it is not compatible with the publish message rate of your application. Do you have a hangout or skype user?
Hi @wandenberg, sorry for not responding, I was away for a few days.
Today we deployed 8d70918. No errors so far and no relevant log messages. Shared memory size set to 64M. We can increase it, but out messages traffic is fairy low (peaking at 100 per 5 minutes), so I don't think that we are really hitting this limit. Payload size is small: all messages are below 40K and 90% are below 2K.
I'm mykhailo.korbakov on Skype and available for a chat during European day and evening.
Sometimes we see errors like these in NGINX logs:
That correlates with failures to deliver message to clients that we sometimes observe.
Here is some additional information:
NGINX server is built in a following way:
Our memory configuration looks like this:
That's what we have from channel stats: