inhabitedtype / httpaf

A high performance, memory efficient, and scalable web server written in OCaml
Other
534 stars 44 forks source link

Segfault in lwt_echo_post #139

Open Lupus opened 5 years ago

Lupus commented 5 years ago

Might be related with #64, but I'm experiencing segfault in lwt_echo_post compiled to native binary.

Steps to reproduce:

Observed behavior:

Probably process ran out of stack. Is shift_buffers function expected to be subject to tailcall optimization, or the issue is due to excessive buffering?

Lupus commented 5 years ago

Hm, last build was using some weird environment. I've ran dune clean to get rid of old artifacts and rebuild from scratch. Backtrace changed a bit:

Program received signal SIGSEGV, Segmentation fault.                                                                                                                          
0x00005555557f01d3 in caml_modify (fp=0x7ffff1d2a028, val=93825026271160) at memory.c:658                                                                                     
658     memory.c: No such file or directory.                                                                                                                                  
(gdb) bt                                                                                                                                                                      
#0  0x00005555557f01d3 in caml_modify (fp=0x7ffff1d2a028, val=93825026271160) at memory.c:658                                                                                 
#1  0x000055555576ec6d in camlFaraday__dequeue_exn_1186 () at lib/faraday.ml:102
#2  0x000055555576ffd0 in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:371
#3  0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
#4  0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
#5  0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
#6  0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
#7  0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
#8  0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
#9  0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
#10 0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
#11 0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
#12 0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372
<...>
#174595 0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372                                                                                       
#174596 0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372                                                                                       
#174597 0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372                                                                                       
#174598 0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372                                                                                       
#174599 0x000055555576ffee in camlFaraday__shift_buffers_1564 () at lib/faraday.ml:372                                                                                       
#174600 0x00005555557694f2 in camlHttpaf__Body__fun_1699 () at lib/faraday.ml:408                                                                                            
#174601 0x00005555557700ca in camlFaraday__shift_flushes_1569 () at lib/faraday.ml:402                                                                                       
#174602 0x00005555557377f6 in camlHttpaf_lwt_unix__fun_2310 () at lwt-unix/httpaf_lwt_unix.ml:168                                                                            
#174603 0x0000555555756a8c in camlLwt__catch_24942 () at src/core/lwt.ml:2025                                                                                                
#174604 0x0000555555759345 in camlLwt__async_53201 () at src/core/lwt.ml:2463                                                                                                
#174605 0x00005555557690f1 in camlHttpaf__Body__do_execute_read_1452 () at lib/body.ml:115                                                                                   
#174606 0x000055555576a06b in camlHttpaf__Parse__fun_2248 () at lib/parse.ml:137                                                                                             
#174607 0x00005555557715f3 in camlAngstrom__Parser__succ$27_1437 () at lib/parser.ml:53                                                                                      
#174608 0x000055555576aaf9 in camlHttpaf__Parse__read_with_more_1808 () at lib/parse.ml:300                                                                                  
#174609 0x000055555576d252 in camlHttpaf__Server_connection__read_with_more_1824 () at lib/server_connection.ml:236                                                          
#174610 0x0000555555736c4b in camlHttpaf_lwt_unix__get_1168 () at lwt-unix/httpaf_lwt_unix.ml:66                                                                             
#174611 0x00005555557374d5 in camlHttpaf_lwt_unix__fun_2241 () at lwt-unix/httpaf_lwt_unix.ml:133                                                                            
#174612 0x0000555555755c4e in camlLwt__callback_13871 () at src/core/lwt.ml:1866                                                                                             
#174613 0x0000555555754748 in camlLwt__iter_callback_list_4529 () at src/core/lwt.ml:1209                                                                                    
#174614 0x00005555557548b9 in camlLwt__run_in_resolution_loop_4573 () at src/core/lwt.ml:1275                                                                                
#174615 0x0000555555754a67 in camlLwt__resolve_4591 () at src/core/lwt.ml:1311                                                                                               
#174616 0x0000555555755c6f in camlLwt__callback_13871 () at src/core/lwt.ml:1880                                                                                             
#174617 0x0000555555754748 in camlLwt__iter_callback_list_4529 () at src/core/lwt.ml:1209                                                                                    
#174618 0x00005555557548b9 in camlLwt__run_in_resolution_loop_4573 () at src/core/lwt.ml:1275                                                                                
#174619 0x0000555555754a67 in camlLwt__resolve_4591 () at src/core/lwt.ml:1311                                                                                               
#174620 0x0000555555756d0c in camlLwt__callback_24954 () at src/core/lwt.ml:2041                                                                                             
#174621 0x0000555555754748 in camlLwt__iter_callback_list_4529 () at src/core/lwt.ml:1209                                                                                    
#174622 0x00005555557548b9 in camlLwt__run_in_resolution_loop_4573 () at src/core/lwt.ml:1275                                                                                
#174623 0x0000555555754a67 in camlLwt__resolve_4591 () at src/core/lwt.ml:1311                                                                                               
#174624 0x0000555555755c6f in camlLwt__callback_13871 () at src/core/lwt.ml:1880                                                                                             
#174625 0x0000555555754748 in camlLwt__iter_callback_list_4529 () at src/core/lwt.ml:1209                                                                                    
#174626 0x00005555557548b9 in camlLwt__run_in_resolution_loop_4573 () at src/core/lwt.ml:1275                                                                                
#174627 0x0000555555754a67 in camlLwt__resolve_4591 () at src/core/lwt.ml:1311                                                                                               
#174628 0x0000555555754cbc in camlLwt__wakeup_general_4627 () at src/core/lwt.ml:1385                                                                                        
#174629 0x000055555575308d in camlLwt_sequence__loop_1061 () at src/core/lwt_sequence.ml:128                                                                                 
#174630 0x000055555575308d in camlLwt_sequence__loop_1061 () at src/core/lwt_sequence.ml:128                                                                                 
#174631 0x0000555555783881 in camlList__iter_1083 () at list.ml:100                                                                                                          
#174632 0x0000555555739d24 in camlLwt_engine__fun_3023 () at src/unix/lwt_engine.ml:357                                                                                      
#174633 0x000055555573c34c in camlLwt_main__run_1134 () at src/unix/lwt_main.ml:33                                                                                           
#174634 0x00005555556d15b6 in camlLwt_echo_post__entry () at examples/lwt/lwt_echo_post.ml:32                                                                                
#174635 0x00005555556ce5e9 in caml_program ()                                                                                                                                
#174636 0x00005555558060d0 in caml_start_program ()
#174637 0x00005555557eae15 in caml_startup_common (argv=0x7fffffffd748, pooling=<optimized out>, pooling@entry=0) at startup.c:156                                           
#174638 0x00005555557eae7b in caml_startup_exn (argv=<optimized out>) at startup.c:161                                                                                       
#174639 caml_startup (argv=<optimized out>) at startup.c:166                                                                                                                 
#174640 0x00005555556cd85c in main (argc=<optimized out>, argv=<optimized out>) at main.c:44                                   

Curl first uploads 3620M (out of 7.2G file), then upload stops and it proceeds to downloading 3620M back. Once downloaded server process receives segfault.

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7240M    0 3620M  100 3620M  21.5M  21.5M  0:02:48  0:02:48 --:--:--     0^C

It points to this part of the code of faraday.ml in my _opam directory:

369 let rec shift_buffers t written =                                                                                                                                         
370   try                                                                                                                                                                     
371     let { len; _ } as iovec = Buffers.dequeue_exn t.scheduled in                                                                                                          
372     if len <= written then begin                                                                                                                                          
373       shift_buffers t (written - len)                                                                                                                                     
374     end else                                                                                                                                                              
375       Buffers.enqueue_front (IOVec.shift iovec written) t.scheduled                                                                                                       
376   with Dequeue_empty ->                                                                                                                                                   
377     assert (written = 0);                                                                                                                                                 
378     if t.scheduled_pos = t.write_pos then begin                                                                                                                           
379       t.scheduled_pos <- 0;                                                                                                                                               
380       t.write_pos <- 0                                                                                                                                                    
381     end
hannesm commented 5 years ago

FWIW some top-level interaction:

# let rec shift_buffers t written =
    try
      let len = 42 in
      if len <= written then begin
        (shift_buffers[@tailcall]) t (written - len)
      end else
        Printf.printf "bla\n"
    with _ ->
      assert (written = 0)
  ;;
Warning 51: expected tailcall
# let rec shift_buffers t written =
  (*  try *)
      let len = 42 in
      if len <= written then begin
        (shift_buffers[@tailcall]) t (written - len)
      end else
        Printf.printf "bla\n"
  (*  with _ ->
      assert (written = 0) *);;
val shift_buffers : 'a -> int -> unit = <fun>

~> calling ourselves within the exception handler leads to stack allocation (of the exception handler AFAICT), you'll have to move the recursive self-call out of the exception handler (the tailcall annotation is pretty good at spotting this)

Lupus commented 5 years ago

Cool! For some reason I was trying to add tailcall annotation to pinned faraday repo clone on my laptop and make dune stop on this warning, without any results. Should've tried with top level :)

Actually I've suggested to get rid of that exception altogether in faraday/51, but for other reasons.

hannesm commented 5 years ago

(another option is to have the let rec (and call to it) inside the ... of try ... catch, installing the exception handler only once)

Lupus commented 5 years ago

Just rebuilt with my fork of faraday and it does not segfault any more, although odd behavior with buffering of 3+GB in memory is still in place.

Lupus commented 5 years ago

Excessive buffering was due to asynchronous write in example echo handler. Can be fixed like this:

--- a/examples/lib/httpaf_examples.ml
+++ b/examples/lib/httpaf_examples.ml
@@ -39,8 +39,9 @@ module Server = struct
       let request_body  = Reqd.request_body reqd in
       let response_body = Reqd.respond_with_streaming reqd response in                                                                                                      
       let rec on_read buffer ~off ~len =                                                                                                                                    
-        Body.write_bigstring response_body buffer ~off ~len;                                                                                                                
-        Body.schedule_read request_body ~on_eof ~on_read;                                                                                                                   
+        Body.schedule_bigstring response_body buffer ~off ~len;                                                                                                             
+        Body.flush response_body (fun () ->                                                                                                                                 
+        Body.schedule_read request_body ~on_eof ~on_read);                                                                                                                  
       and on_eof () =                                                                                                                                                       
         Body.close_writer response_body                                                                                                                                     
       in