mrkn / pycall.rb

Calling Python functions from the Ruby language
MIT License
1.06k stars 75 forks source link

PyCall + Sidekiq causing segmentation fault on 2nd run #81

Closed archonic closed 5 years ago

archonic commented 5 years ago

I believe this may be different from other segmentation issues so I'm opening a new issue. I made a ruby wrapper gem for a python library called html-tree-diff. I can perform as many diffs as I want from the console, and I can preform one sidekiq job (per-restart). On the second sidekiq job, I consistently get a segmentation fault.

Here's the console output:

irb(main):038:0> DocumentChangesRelayWorker.new.perform(400, 8)
  Document Load (28.1ms)  SELECT  "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT $2  [["id", 400], ["LIMIT", 1]]
[ActionCable] Broadcasting to documents:1_400: {:refresh=>"true"}
==== Calcuated changes in 16.88 seconds
=> nil
irb(main):039:0> DocumentChangesRelayWorker.new.perform(400, 7)
  Document Load (20.8ms)  SELECT  "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT $2  [["id", 400], ["LIMIT", 1]]
[ActionCable] Broadcasting to documents:1_400: {:refresh=>"true"}
==== Calcuated changes in 16.78 seconds
=> nil
irb(main):040:0> DocumentChangesRelayWorker.new.perform(400, 6)
  Document Load (14.5ms)  SELECT  "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT $2  [["id", 400], ["LIMIT", 1]]
[ActionCable] Broadcasting to documents:1_400: {:refresh=>"true"}
==== Calcuated changes in 16.67 seconds
=> nil
irb(main):041:0> DocumentChangesRelayWorker.perform_async(400, 5)
=> "fa5a4a2f066c3372f84d494c"
irb(main):042:0> DocumentChangesRelayWorker.perform_async(400, 4)
=> "32a4f4f6e950b3f829941ad6"
# Segmentation fault here

And here is the first part of segmentation fault trace. It was too large to paste in its entirety.

sidekiq_1        | 2019-02-25T05:03:34.089Z 1 TID-go4kyw7qx DocumentChangesRelayWorker JID-32a4f4f6e950b3f829941ad6 INFO: start
sidekiq_1        |   Document Load (19.5ms)  SELECT  "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT $2  [["id", 400], ["LIMIT", 1]]
sidekiq_1        | /usr/local/bundle/bundler/gems/ruby_html_tree_differ-adc1365eb74c/lib/ruby_html_tree_differ.rb:12: [BUG] Segmentation fault at 0x0000000000000008
sidekiq_1        | ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
sidekiq_1        | 
sidekiq_1        | -- Control frame information -----------------------------------------------
sidekiq_1        | c:0038 p:---- s:0198 e:000197 CFUNC  :diff
sidekiq_1        | c:0037 p:0013 s:0191 e:000190 METHOD /usr/local/bundle/bundler/gems/ruby_html_tree_differ-adc1365eb74c/lib/ruby_html_tree_differ.rb:12
sidekiq_1        | c:0036 p:0142 s:0184 e:000183 METHOD /app/app/workers/document_changes_relay_worker.rb:24
sidekiq_1        | c:0035 p:0009 s:0172 e:000171 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:185
sidekiq_1        | c:0034 p:0019 s:0166 e:000165 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:167
sidekiq_1        | c:0033 p:0010 s:0163 e:000162 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/middleware/chain.rb:128
sidekiq_1        | c:0032 p:0003 s:0160 e:000159 BLOCK  /usr/local/bundle/gems/apartment-sidekiq-1.2.0/lib/apartment/sidekiq/middleware/server.rb:5
sidekiq_1        | c:0031 p:0018 s:0157 e:000156 METHOD /usr/local/bundle/gems/apartment-2.2.0/lib/apartment/adapters/abstract_adapter.rb:85
sidekiq_1        | c:0030 p:0019 s:0151 e:000150 METHOD /usr/local/bundle/gems/apartment-sidekiq-1.2.0/lib/apartment/sidekiq/middleware/server.rb:4
sidekiq_1        | c:0029 p:0026 s:0144 e:000143 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/middleware/chain.rb:130 [FINISH]
sidekiq_1        | c:0028 p:0021 s:0141 E:000c38 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/middleware/chain.rb:133
sidekiq_1        | c:0027 p:0021 s:0134 E:001398 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:166
sidekiq_1        | c:0026 p:0005 s:0130 e:000129 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:137
sidekiq_1        | c:0025 p:0002 s:0127 e:000126 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/job_retry.rb:108
sidekiq_1        | c:0024 p:0051 s:0117 e:000116 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:136
sidekiq_1        | c:0023 p:0003 s:0112 e:000111 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/rails.rb:42
sidekiq_1        | c:0022 p:0017 s:0109 e:000108 METHOD /usr/local/bundle/gems/activesupport-5.2.2/lib/active_support/execution_wrapper.rb:87
sidekiq_1        | c:0021 p:0007 s:0104 e:000103 BLOCK  /usr/local/bundle/gems/activesupport-5.2.2/lib/active_support/reloader.rb:73
sidekiq_1        | c:0020 p:0017 s:0101 e:000100 METHOD /usr/local/bundle/gems/activesupport-5.2.2/lib/active_support/execution_wrapper.rb:87
sidekiq_1        | c:0019 p:0008 s:0096 e:000095 METHOD /usr/local/bundle/gems/activesupport-5.2.2/lib/active_support/reloader.rb:72
sidekiq_1        | c:0018 p:0010 s:0092 e:000091 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/rails.rb:41
sidekiq_1        | c:0017 p:0008 s:0088 e:000087 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:132
sidekiq_1        | c:0016 p:0046 s:0085 e:000084 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:243
sidekiq_1        | c:0015 p:0012 s:0078 e:000077 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:127
sidekiq_1        | c:0014 p:0039 s:0075 e:000074 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/job_logger.rb:8
sidekiq_1        | c:0013 p:0014 s:0068 e:000067 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:126
sidekiq_1        | c:0012 p:0002 s:0065 e:000064 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/job_retry.rb:73
sidekiq_1        | c:0011 p:0012 s:0056 e:000055 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:125
sidekiq_1        | c:0010 p:0054 s:0053 e:000052 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/logging.rb:48
sidekiq_1        | c:0009 p:0013 s:0048 e:000047 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/logging.rb:42
sidekiq_1        | c:0008 p:0023 s:0042 e:000041 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:124
sidekiq_1        | c:0007 p:0051 s:0035 E:001278 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:165
sidekiq_1        | c:0006 p:0019 s:0024 e:000023 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:83
sidekiq_1        | c:0005 p:0010 s:0020 e:000019 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/processor.rb:71 [FINISH]
sidekiq_1        | c:0004 p:---- s:0015 e:000014 IFUNC 
sidekiq_1        | c:0003 p:0002 s:0012 e:000011 METHOD /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/util.rb:16
sidekiq_1        | c:0002 p:0028 s:0006 e:000005 BLOCK  /usr/local/bundle/gems/sidekiq-5.2.3/lib/sidekiq/util.rb:25 [FINISH]
sidekiq_1        | c:0001 p:---- s:0003 e:000002 (none) [FINISH]

My thoughts right now is that within the console context, the Ruby GC isn't attempting to deallocate the python object, whereas that is happening within the Sidekiq context, which is causing the segmentation fault in the same way as #71.

kammerer commented 5 years ago

Interestingly, I also use pycall in sidekiq context (to call GDAL). I do not observe segfaults and I don't need to restart sidekiq. I do observe other pycall-related errors, but I can rely on sidekiq's retry mechanism to handle them.

EDIT: I was to quick to comment. Sidekiq worked fine with pycall 1.0.3. With the latest version (1.2.1) it also segfaults for me.

archonic commented 5 years ago

@kammerer Good to know! I'll try out 1.0.3 today and see if I see the same thing. If that's the case it should be pretty easy to see what the breaking change was.

archonic commented 5 years ago

Sadly I still get a segmentation fault with 1.0.3, and it turned out the first time I was trying this out, I was running 1.0.0. Seems this particular bug has always been here.

@kammerer Do you think I could see some of your code that wraps pycall in sidekiq if you consistently avoid segfaults? Are you doing any manual garbage collection or performing pycall methods in a block that spawns a new thread?

kammerer commented 5 years ago

@archonic I am afraid the code is closed-source, I cannot share too much of it. But no, I am not doing anything special like manual garbage collection. I use a helper module to include pycall into PORO classes, but I don't suppose it makes any difference:

# lib/pygdal/base.rb

require 'pycall'                                                                                                                                                             
require 'pycall/import'                                                                                                                                                      

module PyGdal::Base                                                                                                                                                          
  def self.included(base)                                                                                                                                                    
    base.include PyCall::Import                                                                                                                                              
  end                                                                                                                                                                        

  def self.extended(base)                                                                                                                                                    
    base.extend PyCall::Import                                                                                                                                               
  end

  # ...
end
mrkn commented 5 years ago

I missed this issue. I'm sorry for late response.

@archonic Could you show the C backtrace and the code to reproduce this problem? I want to investigate this after RubyKaigi 2019.

archonic commented 5 years ago

I've been splitting my replies between this and https://github.com/mrkn/pycall.rb/issues/49, sorry about that.

I've reproduced the issue on 1.0.0, 1.0.3, but I just checked 1.2.1 and it's not segment faulting 😄🎉. I must not have rebuilt my docker images last time I tried checking 1.2.1. Sorry again! And thanks for the great gem.