Open dubrovine opened 2 years ago
Not enough info I'm afraid. I don't know where this message comes from. Which pool are you using? What server? Any way I can reproduce?
Pool is erlPool (10sockets for each PoolName) Server is cowboy 2.9.0 (If I understand you correctly)
`open(PoolName, PoolSize, Host, Port, Opts)->
Opts2 = Opts#{
connect_timeout => 5000,
retry => 200,
retry_timeout => 500 % 500ms
},
PoolRes =
erlpool:start_pool(PoolName, [
{size, PoolSize},
{group, http2_pools},
{start_mfa, {gun, start_link, [self(), util:to_list(Host), Port, Opts2]}},
{supervisor_period, 1},
{supervisor_intensity, 1000},
{supervisor_restart, permanent}
]),
case PoolRes of
ok ->
log:info("[HTTP2] START_POOL ~p ~p ~p", [self(), PoolName, PoolSize]),
ok;
{error,{already_started,_}} ->
log:error("[HTTP2] START_POOL already_started ~p ~p ~p", [self(), PoolName, PoolSize]),
ok;
PoolErr ->
log:error("[HTTP2] START_POOL_ERR ~p ~p ~p ~p", [self(), PoolName, PoolSize, PoolErr]),
{error, PoolErr}
end.`
error message I receive directly from gun:await(ConnPid, StreamRef, ?TIMEOUT)
unfortunately I can not reproduce the issue locally of on dev environment (only in production env)
we have like 30-50k requests to apns per minute
please tell me if you need more specific information
Oh it's against APNS? If I recall they were doing something weird where they only allow 1 stream initially and then increase that to 100 after the first request succeeded. You might be running into that. Try to trace the HTTP/2 settings that Gun receives.
Okay thank you. I will close the ticket for now and get back to you when I will be able to trace settings on production env
Here is what I found
connected to active production node make some manual tests
OK TEST
`(channelapi@16.38.26.28)6> ConnPid = erlpool:pid('apns_privat24-2.0-voip-prod').
<0.2411.0>
(channelapi@16.38.26.28)7>
(channelapi@16.38.26.28)7> StreamRef = gun:request(ConnPid, <<"POST">>, <<"/3/device/***">>, [{<<"apns-topic">>, <<"ua.hidden">>}, {<<"apns-push-type">>,<<"voip">>}], <<"{\"aps\" : {\"badge\" : 9,\"sound\" : \"bingbong.aiff\"},\"messageID\" : \"ABCDEFGHIJ\"}">>, #{reply_to => self()}).
#Ref<0.1992702093.3131834371.253809>
(channelapi@16.38.26.28)8> gun:info(ConnPid).
#{cookie_store => undefined,intermediaries => [],
origin_host => "api.push.apple.com",origin_port => 443,
origin_scheme => <<"https">>,owner => <0.2178.0>,
protocol => http2,
sock_ip => {16,38,26,28},
sock_port => 52616,
socket =>
{sslsocket,{gen_tcp,#Port<0.453357>,tls_connection,
undefined},
[<0.21000.64>,<0.17325.65>]},
transport => tls}
(channelapi@16.38.26.28)9> gun:stream_info(ConnPid , StreamRef).
{ok,#{ref => #Ref<0.1992702093.3131834371.253809>,
reply_to => <0.27133.65>,state => running}}
(channelapi@16.38.26.28)10> gun:await(ConnPid, StreamRef, 5000).
{response,fin,200,
[{<<"apns-id">>,
<<"016B51EA-EF0D-E5C9-772E-E3FD48F6B1AE">>}]}`
ERROR TEST
`(channelapi@16.38.26.28)6> ConnPid1 = erlpool:pid('apns_privat24-2.0-voip-prod').
<0.2413.0>
(channelapi@16.38.26.28)7>
(channelapi@16.38.26.28)7> StreamRef1 = gun:request(ConnPid1, <<"POST">>, <<"/3/device/***">>, [{<<"apns-topic">>, <<"ua.hidden">>}, {<<"apns-push-type">>,<<"voip">>}], <<"{\"aps\" : {\"badge\" : 9,\"sound\" : \"bingbong.aiff\"},\"messageID\" : \"ABCDEFGHIJ\"}">>, #{reply_to => self()}).
#Ref<0.1992702093.3132620803.58733>
(channelapi@16.38.26.28)8> gun:info(ConnPid1).
#{cookie_store => undefined,intermediaries => [],
origin_host => "api.push.apple.com",origin_port => 443,
origin_scheme => <<"https">>,owner => <0.2178.0>,
protocol => http2,
sock_ip => {16,38,26,28},
sock_port => 50478,
socket =>
{sslsocket,{gen_tcp,#Port<0.491351>,tls_connection,
undefined},
[<0.5892.68>,<0.19792.67>]},
transport => tls}
(channelapi@16.38.26.28)9> gun:stream_info(ConnPid1 , StreamRef1).
{ok,undefined}
(channelapi@16.38.26.28)10> gun:await(ConnPid1, StreamRef1, 5000).
{error,{stream_error,closed}}`
all the data seems like the same except connection pid and stream ref but gun:stream_info
respond with undefined
All I know about this is that it likely happens because the client was disconnected. Most likely it received an {error, closed}
back after trying to do a socket operation. You would get a more verbose stream_error
tuple otherwise. I do not know why that happens or what to do about it. Having a trace of what Gun is doing when this happens may help.
Hi, I have the same, with 1.3.0 I had {error,{stop,{goaway,215,no_error,<<>>},'Client is going away.'}} {error,{stop,{goaway,145,no_error,<<>>},'Client is going away.'}} {error,{stop,{goaway,143,no_error,<<>>},'Client is going away.'}}
And after update to 2.0.1 these problems are not exist but I have had new error {stream_error,{closed,{error,closed}}} And this I can catch in condition:
`
{response, nofin, HttpCode = 200, _Headers} ->
case gun:await_body(ConnPid, StreamRef, ?TIMEOUT) of
{ok, Body} ->
FinishTime = time_util:militime() - StartTime,
metrics:instrument(FinishTime, UrlForMetrics, true, HttpCode),
DecodeBody = json:decode(Body),
{prepare_response(DecodeBody), FinishTime};
{error,{stream_error,{closed,{error,closed}}}} = Error ->
log:error("{"?MODULE_STRING"} - stream_error, error ~p", [Error]),
FinishTime = time_util:militime() - StartTime,
metrics:instrument(FinishTime, UrlForMetrics, false, stream_error),
?ErrorResp(FinishTime)
end;
`
I have noticed that this error turns up on highload services and almost at the same time. 2 screens from 2 different services
Could be happening if the network has hiccups. You should see TCP connections getting closed at the same time.
Hello
Gun vsn - 2.0.0-rc.2
I faced some stream_error issue with the latest version of gun
connection pool options for gun
Opts2 = Opts#{ connect_timeout => 5000, retry => 200, retry_timeout => 500 % 500ms }
here is the part of the code
a lot of time I receive the error
` <0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}}
<0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}}
<0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}}
<0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}}
<0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}}
<0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}}
<0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}}
<0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}}
<0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}}
<0.16797.7>@: [HTTP2_AF] ERR_AWAIT 'apns_voip-prod' {error,{stream_error,closed}} `
Yes it is working as it should after retrying but I wanted to ask if it's good decision to make it work this way ?
I also tried to use settings like max_concurent_streams (200) but had no luck (Still reproducing such error messages)
Could you please point me to the right direction to (maybe) avoid this type of messages and re-try function (sometimes I need to execute function 15 times before it pass correctly)
thanks in advance