Open Shemeikka opened 3 years ago
Would be good to trace the Cowboy connection process to see what it's doing. Or get a process_info(Pid)
as a start.
Hi,
Here's a process info for the cowboy supervisor
[
registered_name: :cowboy_sup,
current_function: {:gen_server, :loop, 7},
initial_call: {:proc_lib, :init_p, 5},
status: :waiting,
message_queue_len: 0,
links: [#PID<0.240.0>, #PID<0.242.0>],
dictionary: [
"$ancestors": [#PID<0.240.0>],
"$initial_call": {:supervisor, :cowboy_sup, 1}
],
trap_exit: true,
error_handler: :error_handler,
priority: :normal,
group_leader: #PID<0.239.0>,
total_heap_size: 986,
heap_size: 610,
stack_size: 12,
reductions: 1530,
garbage_collection: [
max_heap_size: %{error_logger: true, kill: true, size: 0},
min_bin_vheap_size: 46422,
min_heap_size: 233,
fullsweep_after: 65535,
minor_gcs: 4
],
suspending: []
]
Only change to the process info during the failing upload is slightly larger 'reductions'. Don't know how I could get process info for the actual connection process since the connection closes usually within 2-3 seconds.
Here's an link to a Gist which has output of Observer trace of Cowboy during the failed upload. I had to remove large amount of lines, mostly binary data, from the output.
No not cowboy_sup
, the connection process. If it gets stuck in a function it'll tell you which but you have to find the process first. But nevermind that, it sounds like the connection is being closed, rather than it being stuck?
If it's a crash then you're going to have a crash report. If it's "normal" then you won't. Your best bet then would be to investigate what happens in terminate
, it'll give you the close reason.
Should the tracing show if the connection process closes? The linked Gist only shows when the connection is spawned.
Any ideas how to catch cowboy connection process termination? I'm not receiving any crash reports or other messages to log when using standard plug_cowboy with added logging.
I have tried logging with plug by setting setting
config :plug_cowboy,
log_exceptions_with_status_code: [1..599]
I also tried using PhoenixCowboyLogging which uses protocol_options: :onresponse
but that didn't have any effect.
OK I have missed the gist. You can trace for proc
events exclusively. That should be enough. No need for calls or messages.
Hi,
Here's a link to a Gist. This time it's more interesting than the previous.
Here's a process info of PID 0.3924.0
during the upload. Not sure if that's the connection process.
> Process.info(pid("0.3924.0"))
[
current_function: {:gen_statem, :loop_receive, 3},
initial_call: {:proc_lib, :init_p, 5},
status: :waiting,
message_queue_len: 0,
links: [#PID<0.3923.0>, #Port<0.704>, #PID<0.132.0>],
dictionary: [
"$initial_call": {:ssl_gen_statem, :init, 1},
ssl_pem_cache: :ssl_pem_cache,
"$ancestors": [:tls_connection_sup, :tls_sup, :ssl_connection_sup, :ssl_sup,
#PID<0.121.0>],
ssl_manager: :ssl_manager
],
trap_exit: true,
error_handler: :error_handler,
priority: :normal,
group_leader: #PID<0.120.0>,
total_heap_size: 15143,
heap_size: 4185,
stack_size: 18,
reductions: 29117680,
garbage_collection: [
max_heap_size: %{error_logger: true, kill: true, size: 0},
min_bin_vheap_size: 46422,
min_heap_size: 233,
fullsweep_after: 65535,
minor_gcs: 1350
],
suspending: []
]
I did some more testing and noticed that this upload failure happens only when using HTTP/2 with TLS. Uploading large files works when client forces the use of HTTP/1.1 with TLS.
There might have been interesting events but they have been dropped. The exit reason is normal
and that doesn't help.
OK then I suppose figuring out what cowboy_http2
is doing is the next step. So again a trace this time dbg:tpl(cowboy_http2, []). dbg:p(all, c).
- trace to file, and see at what point it goes into the terminate
functions. What happens immediately before is likely the reason why the upload fails.
Hi,
I had to try several different settings due to the large amount of data but I finally got something interesting
08:12:30:156917 (<0.525.0>) in {code_server,call,1}
08:12:30:156987 (<0.525.0>) spawn <0.530.0> as erlang:apply(#Fun<gen_statem.0.120824088>,[])
08:12:30:156999 (<0.530.0>) spawned <0.525.0> {erlang,apply,[#Fun<gen_statem.0.120824088>,[]]}
08:12:30:157010 (<0.525.0>) out {gen_statem,call_clean,4}
08:12:30:157177 (<0.527.0>) in {cowboy_http2,terminate_linger_loop,3}
08:12:30:157200 (<0.527.0>) exit {shutdown,{connection_error,enhance_your_calm,
'Frame rate larger than configuration allows. Flood? (CVE-2019-9512, CVE-2019-9515, CVE-2019-9518)'}}
08:12:30:157210 (<0.527.0>) out_exited 0
08:12:30:157291 (<0.530.0>) in {erlang,apply,2}
08:12:30:157298 (<0.530.0>) out {gen,do_call,4}
08:12:30:157315 (<0.525.0>) in {gen_statem,call_clean,4}
08:12:30:157321 (<0.525.0>) out {gen_statem,call_clean,4}
08:12:30:157329 (<0.524.0>) in {gen_statem,loop_receive,3}
08:12:30:157363 (<0.524.0>) out {gen_statem,loop_receive,3}
08:12:30:157369 (<0.530.0>) in {gen,do_call,4}
08:12:30:157374 (<0.530.0>) exit normal
08:12:30:157377 (<0.530.0>) out_exited 0
08:12:30:157383 (<0.524.0>) in {gen_statem,loop_receive,3}
08:12:30:157386 (<0.524.0>) out {gen_statem,loop_receive,3}
08:12:30:157390 (<0.525.0>) in {gen_statem,call_clean,4}
08:12:30:157407 (<0.525.0>) spawn <0.531.0> as erlang:apply(#Fun<tls_connection.0.3952457>,[])
08:12:30:157410 (<0.531.0>) spawned <0.525.0> {erlang,apply,[#Fun<tls_connection.0.3952457>,[]]}
08:12:30:157417 (<0.525.0>) exit {shutdown,peer_close}
08:12:30:157422 (<0.525.0>) out_exited 0
08:12:30:157457 (<0.531.0>) in {erlang,apply,2}
08:12:30:157463 (<0.531.0>) out {tls_connection,'-ensure_sender_terminate/2-fun-0-',1}
08:12:30:157467 (<0.524.0>) in {gen_statem,loop_receive,3}
08:12:30:157472 (<0.524.0>) exit {shutdown,peer_close}
08:12:30:157478 (<0.524.0>) out_exited 0
08:12:35:158429 (<0.531.0>) in {tls_connection,'-ensure_sender_terminate/2-fun-0-',1}
08:12:35:158452 (<0.531.0>) exit normal
08:12:35:158457 (<0.531.0>) out_exited 0
This seems to be same problem as in #1497 ?
I tried setting max_received_frame
like protocol_options: [max_received_frame_rate: {150_000, 10_000}]
and with that value, the upload works with HTTP/2 and TLS.
Our use-case should be able to handle large files (10-20GB) and I'm thinking if we enable large (500_000+?) values for max_received_frame
, then we are vulnerable against those mentioned CVEs. Are there any other mitigations in Cowboy for those CVEs?
I guess we should increase it again. I had increased to 10k/10s but if that's not enough then it should be increased again.
Those CVEs are not super important, especially in Erlang, and it's basically impossible to get a value that works out for everyone. So I think we could increase the default to whatever works out for you.
Alternatively, increasing max_frame_size_received
should reduce the number of frames you're getting. The default is very conservative (it's the protocol's default). It might make sense to provide a different default for Cowboy.
A third solution would be to produce a warning log for these types of connection errors (enhance_your_calm
) with the message included and perhaps some help to remedy to this situation.
Okay, I think it's enough if we just set both max_received_frame
and max_frame_size_received
to some good values in our application. Will have to test to figure out what those values could be.
I once had the a similar issue, and fixed it by setting idle_timeout
larger.
I guess the connection terminated before the file transfer finished.
Hello everyone, I am looking for feedback as to what the max_received_frame
and max_frame_size_received
values should be. What did you end up using? Thank you.
For idle_timeout
there is now an option reset_idle_timeout_on_send
that should help. It will default to false
for backward compatibility reasons but the default will likely change in 3.0.
Nginx allows 16MB (the maximum possible): https://github.com/nginx/nginx/blob/6ceef19/src/http/v2/ngx_http_v2.h#L21 and https://github.com/nginx/nginx/blob/6ceef19/src/http/v2/ngx_http_v2.c#L2758
I don't think we want to increase it that much but we can definitely use a higher default value. If we increase it to 1MB then this doesn't invalidate the CVE protection and we can still receive large files. I think I will also increase the max_receive_frame_sent
so that we send responses faster by default. Using 1MB (1048576) for both.
OK the max frame size sent is already infinity
so nothing to do there. For the max_frame_size_received
I will only update the manual to indicate the importance of higher values. The default will have to wait for Cowboy 2.12 though because I am not updating Cowlib in 2.11. So this ticket will be left open for now.
Documentation update is done. Leaving this ticket open, thanks!
Hi,
We are experiencing failed file uploads when using TLS and Cowboy 2.7+ (2.7 - 2.9.0). Large file uploads (1GB+) will be shown as failed and stalled in Chrome and Firefox, and when using Curl, the response is curl: (56) Unexpected EOF.
Uploading smaller files about 100MB works with TLS and uploading large files works if TLS is not used. However, if we use TLS and Cowboy 2.6.3, then uploading large files works.
Test file can be generated using command
fallocate -l 10G gentoo_root.img
. Uploading the file can be done using commandcurl -F 'file=@gentoo_root.img' <URL>
.How we configure Cowboy in our Elixir project. Certificates are generated using mkcert.
We use Plug.Parser for reading the multipart file upload. Those options i.e. length and others, don't have any effect on this problem.
Versions:
Also tried with Erlang 23.3 versions and same experience.