rails / rails

Ruby on Rails
https://rubyonrails.org
MIT License
55.75k stars 21.59k forks source link

ActionController::Live hangs with Deadlock after upgrading to rails 5.1.4 #31200

Closed gurix closed 6 years ago

gurix commented 6 years ago

In a project i recently upgraded from rails 4.2.8 to 5.1.4. There is a CSV export using ActionController::Live to stream line for line.

class UsersController < ApplicationController
  include ActionController::Live
  respond_to :csv

  #before_action :authenticate_administrator

  def index
    response_header
    response_data
  ensure
    response.stream.close
  end

  private

  def response_header
    response.headers['Content-Disposition'] = 'attachment; filename=users.csv'
    response.headers['Content-Type'] = 'text/csv'

    # Write csv header once ...
    response.stream.write CSV.generate_line(user_fields + personal_detail_fields + additional_fields)
  end

  def response_data
    # .. and generate a line for each user.
    User.each do |user|
      next if user.personal_detail.blank?
      response.stream.write CSV.generate_line(user_values(user) + personal_detail_values(user) + additional_values(user))
    end
  end
....

Expected behavior

Normally the code streams the CSV by sending the CSV line by line.

Actual behavior

After several lines the stream hangs in an unknown state. The production server is blocked and I have to restart the webserver.

To investigate the problem, I changed the code using this example from the API reference.

class MyController < ActionController::Base
  include ActionController::Live

  def stream
    response.headers['Content-Type'] = 'text/event-stream'
    100.times { |i|
      response.stream.write "hello world\n"
      puts i
      sleep 1
    }
  ensure
    response.stream.close
  end
end

I figured out that it breaks as soon as I invoke response.stream.write after about 10 writes. Without writing to the stream it do not break.

Exiting the webserver it shows a Deadlock.

$ rails s
=> Booting WEBrick
=> Rails 5.1.4 application starting in development on http://localhost:3000
=> Run `rails server -h` for more startup options
Error connecting to Redis on 127.0.0.1:6379 (Errno::ECONNREFUSED)
^[[A[2017-11-22 16:04:53] INFO  WEBrick 1.3.1
[2017-11-22 16:04:53] INFO  ruby 2.3.1 (2016-04-26) [x86_64-darwin16]
[2017-11-22 16:04:53] INFO  WEBrick::HTTPServer#start: pid=6561 port=3000
Started GET "/users.csv" for ::1 at 2017-11-22 16:05:01 +0100
Processing by UsersController#index as CSV
[ahoy] Visit excluded
0
1
2
3
4
5
6
7
8
9
^C[2017-11-22 16:05:51] INFO  going to shutdown ...
Exiting
Completed 401 Unauthorized in 49927ms

::1 - - [22/Nov/2017:16:05:01 CET] "GET /users.csv HTTP/1.1" 200 0
- -> /users.csv
/Users/markus/.rbenv/versions/2.3.1/lib/ruby/2.3.0/webrick/server.rb:215:in `join': No live threads left. Deadlock? (fatal)
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/2.3.0/webrick/server.rb:215:in `block (2 levels) in start'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/2.3.0/webrick/server.rb:215:in `each'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/2.3.0/webrick/server.rb:215:in `block in start'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/2.3.0/webrick/server.rb:33:in `start'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/2.3.0/webrick/server.rb:164:in `start'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rack-2.0.3/lib/rack/handler/webrick.rb:34:in `run'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rack-2.0.3/lib/rack/server.rb:a297:in `start'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/railties-5.1.4/lib/rails/commands/server/server_command.rb:44:in `start'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/railties-5.1.4/lib/rails/commands/server/server_command.rb:135:in `block in perform'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/railties-5.1.4/lib/rails/commands/server/server_command.rb:130:in `tap'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/railties-5.1.4/lib/rails/commands/server/server_command.rb:130:in `perform'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/thor-0.20.0/lib/thor/command.rb:27:in `run'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/thor-0.20.0/lib/thor/invocation.rb:126:in `invoke_command'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/thor-0.20.0/lib/thor.rb:387:in `dispatch'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/railties-5.1.4/lib/rails/command/base.rb:63:in `perform'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/railties-5.1.4/lib/rails/command.rb:44:in `invoke'
    from /Users/markus/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/railties-5.1.4/lib/rails/commands.rb:16:in `<top (required)>'
    from bin/rails:8:in `require'
    from bin/rails:8:in `<main>'

Any Ideas, how to solve this issue? I also setup a new empty rails project and added the streaming as described below. Then it works without any issue, so I guess some interference with a gem?

System configuration

Rails version: Rails 5.1.4

Ruby version: ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin16]

bundle show:

Gems included by the bundle:
  * actioncable (5.1.4)
  * actionmailer (5.1.4)
  * actionpack (5.1.4)
  * actionview (5.1.4)
  * activejob (5.1.4)
  * activemodel (5.1.4)
  * activerecord (5.1.4)
  * activesupport (5.1.4)
  * addressable (2.5.2)
  * ahoy_matey (1.6.0)
  * airbrussh (1.3.0)
  * arel (8.0.0)
  * autoprefixer-rails (7.1.6)
  * aws-partitions (1.34.0)
  * aws-sdk-core (3.7.0)
  * aws-sdk-ec2 (1.17.0)
  * aws-sigv4 (1.0.2)
  * bcrypt (3.1.11)
  * better_errors (2.4.0)
  * bindex (0.5.0)
  * binding_of_caller (0.7.3)
  * bootstrap-sass (3.3.7)
  * breadcrumbs_on_rails (3.0.1)
  * browser (2.5.2)
  * bson (4.2.2)
  * builder (3.2.3)
  * bundler (1.16.0)
  * byebug (9.1.0)
  * capistrano (3.10.0)
  * capistrano-bundler (1.1.4)
  * capistrano-passenger (0.2.0 6dc408e)
  * capistrano-rails (1.3.0)
  * capistrano-sidekiq (0.20.0 973c368)
  * capybara (2.15.4)
  * carmen (1.0.2)
  * carmen-rails (1.0.1)
  * chartkick (2.2.5)
  * childprocess (0.8.0)
  * cliver (0.3.2)
  * codemirror-rails (5.16.0)
  * coderay (1.1.2)
  * coffee-rails (4.2.2)
  * coffee-script (2.4.1)
  * coffee-script-source (1.12.2)
  * concurrent-ruby (1.0.5)
  * connection_pool (2.2.1)
  * crass (1.0.2)
  * css_parser (1.6.0)
  * debug_inspector (0.0.3)
  * devise (4.3.0)
  * diff-lcs (1.3)
  * docile (1.1.5)
  * dotenv (2.2.1)
  * dotenv-rails (2.2.1)
  * errbase (0.0.3)
  * erubi (1.7.0)
  * erubis (2.7.0)
  * et-orbi (1.0.8)
  * exception_notification (4.2.2)
  * execjs (2.7.0)
  * factory_bot (4.8.2)
  * faker (1.8.4)
  * faraday (0.12.2)
  * ffi (1.9.18)
  * font-awesome-sass (4.7.0)
  * geocoder (1.4.4)
  * globalid (0.4.1)
  * google_timezone (0.0.5)
  * haml (5.0.4)
  * haml-rails (1.0.0)
  * hashie (3.5.6)
  * helena (1.3.0)
  * helena_administration (0.3.4)
  * html2haml (2.2.0)
  * html5-rails (0.1.0)
  * i18n (0.9.1)
  * i18n-js (3.0.2)
  * jmespath (1.3.1)
  * jquery-rails (4.3.1)
  * json (1.8.6)
  * jwt (1.5.6)
  * kaminari-actionview (1.1.1)
  * kaminari-core (1.1.1)
  * kaminari-mongoid (1.0.1)
  * kgio (2.11.0)
  * launchy (2.4.3)
  * libv8 (3.16.14.19)
  * loofah (2.1.1)
  * mail (2.7.0)
  * mail_form (1.7.0)
  * method_source (0.9.0)
  * mime-types (3.1)
  * mime-types-data (3.2016.0521)
  * mini_mime (1.0.0)
  * mini_portile2 (2.3.0)
  * minitest (5.10.3)
  * mongo (2.4.3)
  * mongoid (6.2.1)
  * mongoid-rspec (4.0.0.pre.alpha1 f392d24)
  * mongoid-sadstory (0.0.2)
  * mongoid-simple-tags (0.1.3)
  * mongoid-tree (2.1.0)
  * mongoid_orderable (4.1.1)
  * multi_json (1.12.2)
  * multi_xml (0.6.0)
  * multipart-post (2.0.0)
  * mustermann (1.0.1)
  * net-scp (1.2.1)
  * net-ssh (4.2.0)
  * nio4r (2.1.0)
  * nokogiri (1.8.1)
  * oauth (0.5.3)
  * oauth2 (1.4.0)
  * omniauth (1.7.1)
  * omniauth-facebook (4.0.0)
  * omniauth-oauth (1.1.0)
  * omniauth-oauth2 (1.4.0)
  * omniauth-twitter (1.4.0)
  * orm_adapter (0.5.0)
  * poltergeist (1.16.0)
  * pry (0.11.2)
  * pry-byebug (3.5.0)
  * public_suffix (3.0.1)
  * rack (2.0.3)
  * rack-accept (0.4.5)
  * rack-attack (5.0.1)
  * rack-protection (2.0.0)
  * rack-test (0.7.0)
  * rails (5.1.4)
  * rails-boilerplate (0.1.9)
  * rails-dom-testing (2.0.3)
  * rails-html-sanitizer (1.0.3)
  * rails-i18n (5.0.4)
  * railties (5.1.4)
  * raindrops (0.19.0)
  * rake (12.2.1)
  * rb-fsevent (0.10.2)
  * rb-inotify (0.9.10)
  * redis (4.0.1)
  * ref (2.0.0)
  * referer-parser (0.3.0)
  * request_store (1.3.2)
  * responders (2.4.0)
  * roadie (3.2.2)
  * roadie-rails (1.2.1)
  * route_translator (5.5.0)
  * rspec (3.7.0)
  * rspec-collection_matchers (1.1.3)
  * rspec-core (3.7.0)
  * rspec-expectations (3.7.0)
  * rspec-mocks (3.7.0)
  * rspec-rails (3.7.1)
  * rspec-support (3.7.0)
  * ruby_parser (3.10.1)
  * rubyzip (1.2.1)
  * rufus-scheduler (3.4.2)
  * safely_block (0.2.0)
  * sass (3.5.3)
  * sass-listen (4.0.0)
  * sass-rails (5.0.6)
  * selenium-webdriver (3.7.0)
  * sexp_processor (4.10.0)
  * sidekiq (5.0.5)
  * sidekiq-cron (0.6.3)
  * simple_form (3.5.0)
  * simplecov (0.15.1)
  * simplecov-html (0.10.2)
  * sinatra (2.0.0)
  * slack-notifier (2.3.1)
  * slim (3.0.9)
  * slim-rails (3.1.3)
  * sprockets (3.7.1)
  * sprockets-rails (3.2.1)
  * sshkit (1.15.0)
  * temple (0.8.0)
  * therubyracer (0.12.3)
  * thor (0.20.0)
  * thread_safe (0.3.6)
  * tilt (2.0.8)
  * turnout (2.4.1)
  * tzinfo (1.2.4)
  * uglifier (3.2.0)
  * unicorn (5.3.1)
  * user_agent_parser (2.4.1)
  * uuidtools (2.1.5)
  * warden (1.2.7)
  * web-console (3.5.1)
  * websocket-driver (0.6.5)
  * websocket-extensions (0.1.3)
  * wicked_pdf (1.1.0)
  * xpath (2.1.0)
Kjir commented 6 years ago

I experience the same behaviour after upgrading from rails 4.2. I tried and I could also reproduce this with a freshly created rails app:

rails new stream_test
cd stream_test
bin/rails g controller Home index

Then the controller:

class HomeController < ApplicationController
  include ActionController::Live

  def index
    100.times { response.stream.write('a') }
    # If I uncomment the following, the test will just hang forever...
    # ensure
    #   response.stream.close
  end
end

And now bin/rails test will fail with fatal: No live threads left. Deadlock?. Is there any workaround for this?

I know that this also happens with rails 5.0

Rails version: 5.1.4 Ruby version: ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-darwin16] Also tried with: Ruby version: ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin16]

gurix commented 6 years ago

I can confirm this, it still a problem. I tested it also with the edge version of rails using gem 'rails', github: 'rails/rails'. In this case bin/rails test also has a deadlock but it just does not end. Very disappointing 😔

Kjir commented 6 years ago

I did some more investigation to try and pinpoint the source of this error. Rails will hang while pushing more data onto the buffer. The reason for that is pretty simple: the SizedQueue is initialized with a size of 10, which means that if the data is not consumed fast enough the buffer is full and it will just block.

In normal contexts, this means that the browser is downloading slower than we can process the data, so it is actually a good thing that we are waiting. As soon as the queue is freed, we append new data.

In a test context, however, we are requesting the whole body in one go and if the queue is filled up, there is no way to empty it, thus the test will deadlock.

gurix commented 6 years ago

This is consistent with my experiences, that fails after 2 header informations and 8 CSV-Lines pushed in the buffer. But I had not the time to investigate it deeper.

And just to mention, it is not only a testing problem. It broke once my production site because the export is done not on a regular base and the tests only covers about three lines. So I ran into this even with careful testing, but had to downgrade to rails 4.8.10.

gurix commented 6 years ago

The limitation to a SizedQueue of 10 was implemented in 2012 since Rails 4.0.0.beta1: https://github.com/rails/rails/commit/37a764b8e0af585b3475a54566816a2ffc52401b

So this alone cannot be the reason for this change of behaviour.

Kjir commented 6 years ago

Thanks for the inputs, whenever I have some time I will look further into it!

gurix commented 6 years ago

First I added some tests that demonstrates the effect when having i.e 100 Server Side Events: https://github.com/gurix/rails/commit/b620d46099a61b693160ba97bd396871b9d6f83b as expected everything fails as expected. I hope I can figure out to solve the issue. @tenderlove can you tell us why you added the limitation of the queue size SizedQueue? I don't think that this is the major problem since it was already introduced 6 years ago, but replacing it with a simple Queue.new solves the issue.

rcugut commented 6 years ago

I have the exact situation with an existing larger app, using rails 5.1.5 and ruby 2.5.0 (with postgresql, mongo and redis). The server "freezes" when I try to do response.stream.write more than 10 times in a row (during a CSV export).

However, I created a completely new Rails app, and copied just the controller in question... and it works! I can't make any sense of it right now. I'll do more digging to figure it out, because I think it may be related to some other gem we use in the existing app versus the greenfield app.

UPDATE

It's definitely a 3rd party gem! And in my case I found it to be gon.

I've put the demo rails app here: https://github.com/rcugut/xlsxtream_playground

Whenever the gon gem is present, things stop working after 10 response.stream.write.

rcugut commented 6 years ago

@gurix @tenderlove this originates from Rails, for any app that will (unintentionally) load ActionController::TestCase::Behavior

Any rails app that loads that file will break the method ActionController::Live#new_controller_thread by losing the new thread creation.

gurix commented 6 years ago

@rcugut I can confirm that there is some issue with ActionController::TestCase::Behavior. What I did was to fork the actual master branch of rails/rails, commented out the module Live... end override and used this rails version in my Gemfile gem 'rails', path: '../rails. If the override is commented out it does not end in a deadlock and downloads my files as intended. Reenabling the code by removing comments leads in a deadlock.

So thanks @rcugut for this input. In the case of gon, the issue is obvious. As I am not using this gem, I now have to find the responsible code for my case. I also guess that it is another gem.

rcugut commented 6 years ago

The proposed fix for the gon gem was to only include the ActionController::TestCase::Behavior only if RAILS_ENV is "test". Which I think is the rule of thumb with any Rails tests-related code to be applied in a well written gem: you only require test-related code (from Rails codebase) if you're actually running tests. This is the mindset I get after reading https://github.com/rails/rails/commit/a640da454fdb9cd8806a1b6bd98c2da93f1b53b9

gurix commented 6 years ago

Raising an exception including some dirty swearwords in ActionController::TestCase::Behavior turned out that the gem route_translator suffers from a similar issue as gon. So I fixed it in the same way in https://github.com/enriclluelles/route_translator/pull/183. I agree that test related code in production code should be separated. Querying the environment is one way, but we could think about going a step further by including it in the test code only. This could be done by monkey patching when loading tests only.

@rcugut Thanks for pointing out this issue. I owe you a beer at least.

@tenderlove Even if it is not good practice to include test-related code as it happens here, I think killing new_controller_thread as you did in https://github.com/rails/rails/commit/a640da454fdb9cd8806a1b6bd98c2da93f1b53b9 is also not the best approach. As I am not so deep into rails development I have not a solution at the moment. It is more intuitive.

EDIT: I should also mention that this bug affects every testing controllers including ActionController::Live using rspec if you send more then 10 responses. An example is the following test: https://github.com/gurix/pana-nonverbal-validation/blob/4cae5ec5f06a9a0b916d79c1698a6b4424bead65/spec/features/export/show_spec.rb#L8. This because rspec also includes ActionController::TestCase::Behavior by default.

rails-bot[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not been commented on for at least three months. The resources of the Rails team are limited, and so we are asking for your help. If you can still reproduce this error on the 5-2-stable branch or on master, please reply with all of the information you have about it in order to keep the issue open. Thank you for all your contributions.

sandrasi commented 5 years ago

It's quite an old and closed issue but I thought I mention my workaround for this problem when using RSpec (until some permanent solution is introduced). Someone may find it useful.

I created this little helper method:

def consume_streamed_response
  consumer_thread = nil
  allow(controller).to receive(:dispatch).and_wrap_original do |m, *args|
    consumer_thread = Thread.new { response.body }
    m.call(*args)
    consumer_thread.join
  end
  yield
  consumer_thread.value
end

It can be used as such:

it('consumes the streamed respose') do
  response_body = consume_streamed_response { get :streaming_endpoint }
  expect(response_body).to eq(...)
end

Normally the deadlock occurs because - as mentioned above - ActionController::Live is single-threaded in RSpec due to the redefinition of the new_controller_thread method in action_controller/test_case.rb: the SizedQueue fills up and it waits until the response gets consumed, but the response can't be processed because in a single-threaded environment it happens after the controller has finished with the request.

The method above starts reading response.body in a new thread, and the main thread (in which the spec is running) waits for it to finish (consumer_thread.join). The new thread is created just before the request is dispatched to the controller to make sure that the response getter returns the same LiveTestResponse instance the controller.dispatch is called with (creating a new response instance and dispatching the request to the controller happens in ActionController::TestCase::Behavior#process).

It worked for my use case but I haven't done any extensive testing so I can't guarantee that it works for everyone ¯\_(ツ)_/¯.

brauliobo commented 5 years ago

this issue is still happening with rails 4.2.11!

Aesthetikx commented 5 years ago

I opened this pull request for react-rails, but now I'm finding this is also a problem with draper from a change introduced 7 years ago. I'm wondering how many gems are inadvertently causing action_controller/test_case.rb to be required in non test environments? Perhaps there is a way to fix the root cause here?

delongGao commented 5 years ago

Just another update: this is still happening with the gem apipie-rails, and not just in test environment, but in production.

nickshatilo commented 4 years ago

This issue is still happening in 5.2.4.1 :(

nickshatilo commented 4 years ago

@rails-bot

laurenmacam commented 4 years ago

i did a 'cleanup' yesterday and by deleting these gems seemed to work: honeybadger airbrake rails-controller-testing ruby-growl dotenv-rails

so one (or more?) of them was causing the problem

npresco commented 4 years ago

I just ran into this while testing a controller that has ActionController::Live. A minitest solution for those who find this, I stubbed SizedQueue.new to use a larger queue, i.e.:

require 'minitest/autorun'
...

SizedQueue.stub :new, SizedQueue.new(1000) do
  get :streaming_path
end
schmijos commented 3 years ago

For RSpec I successfully used:

before do
  allow(SizedQueue).to receive(:new).and_return(SizedQueue.new(1000))
end