instana / ruby-sensor

💎 Ruby Distributed Tracing & Metrics Sensor for Instana
https://www.instana.com/
MIT License
26 stars 25 forks source link

[Bug]: Issue with ::Instana.agent.after_fork in puma cluster mode, after_fork is blocking to start server #366

Closed ramasangita closed 6 months ago

ramasangita commented 6 months ago

Problem Description

In Puma cluster mode when ::Instana.agent.after_fork is added, puma server startup is blocked. after_fork is blocking operation in the latest instana version

Minimal, Complete, Verifiable, Example

workers 1 preload_app!

on_workerboot do || ::Instana.agent.after_fork if defined?(::Instana) end

Add this in config/puma.rb and start the puma server

Gemfile.lock

GEM
  remote: https://rubygems.org/
  specs:
    actionview (7.1.3.2)
      activesupport (= 7.1.3.2)
      builder (~> 3.1)
      erubi (~> 1.11)
      rails-dom-testing (~> 2.2)
      rails-html-sanitizer (~> 1.6)
    activemodel (7.1.3.2)
      activesupport (= 7.1.3.2)
    activerecord (7.1.3.2)
      activemodel (= 7.1.3.2)
      activesupport (= 7.1.3.2)
      timeout (>= 0.4.0)
    activesupport (7.1.3.2)
      base64
      bigdecimal
      concurrent-ruby (~> 1.0, >= 1.0.2)
      connection_pool (>= 2.2.5)
      drb
      i18n (>= 1.6, < 2)
      minitest (>= 5.1)
      mutex_m
      tzinfo (~> 2.0)
    aes (0.5.1)
    amq-protocol (2.3.2)
    ansi (1.5.0)
    ast (2.4.2)
    base64 (0.2.0)
    bigdecimal (3.1.7)
    brpoplpush-redis_script (0.1.3)
      concurrent-ruby (~> 1.0, >= 1.0.5)
      redis (>= 1.0, < 6)
    builder (3.2.4)
    bunny (2.22.0)
      amq-protocol (~> 2.3, >= 2.3.1)
      sorted_set (~> 1, >= 1.0.2)
    byebug (11.1.3)
    chronic_duration (0.10.6)
      numerizer (~> 0.1.1)
    chunky_png (1.4.0)
    citrus (3.0.2)
    coderay (1.1.3)
    concurrent-ruby (1.2.3)
    connection_pool (2.4.1)
    crass (1.0.6)
    domain_name (0.6.20240107)
    drb (2.2.1)
    erubi (1.12.0)
    et-orbi (1.2.9)
      tzinfo
    faraday (2.9.0)
      faraday-net_http (>= 2.0, < 3.2)
    faraday-net_http (3.1.0)
      net-http
    ffi (1.16.3)
    fugit (1.10.1)
      et-orbi (~> 1, >= 1.2.7)
      raabro (~> 1.4)
    google-maps (3.0.7)
      hashie (~> 4.1, >= 4.1.0)
      httpclient (~> 2.7, >= 2.7.1)
      multi_json (>= 1.15)
      ruby-hmac (~> 0.4.0)
    hashie (4.1.0)
    http-accept (1.7.0)
    http-cookie (1.0.5)
      domain_name (~> 0.5)
    httpclient (2.8.3)
    i18n (1.14.4)
      concurrent-ruby (~> 1.0)
    iban-tools (1.2.1)
    instana (1.213.2)
      concurrent-ruby (>= 1.1)
      oj (>= 3.0.11)
      sys-proctable (>= 1.2.2)
    json (2.7.1)
    jwt (2.8.1)
      base64
    karafka (2.3.3)
      karafka-core (>= 2.3.0, < 2.4.0)
      waterdrop (>= 2.6.12, < 3.0.0)
      zeitwerk (~> 2.3)
    karafka-core (2.3.0)
      karafka-rdkafka (>= 0.14.8, < 0.15.0)
    karafka-rdkafka (0.14.10)
      ffi (~> 1.15)
      mini_portile2 (~> 2.6)
      rake (> 12)
    karafka-testing (2.3.1)
      karafka (>= 2.3.0, < 2.4.0)
      waterdrop (>= 2.6.12)
    karafka-web (0.8.2)
      erubi (~> 1.4)
      karafka (>= 2.3.0, < 2.4.0)
      karafka-core (>= 2.3.0, < 2.4.0)
      roda (~> 3.68, >= 3.69)
      tilt (~> 2.0)
    language_server-protocol (3.17.0.3)
    loofah (2.22.0)
      crass (~> 1.0.2)
      nokogiri (>= 1.12.0)
    matrix (0.4.2)
    method_source (1.0.0)
    mime-types (3.5.2)
      mime-types-data (~> 3.2015)
    mime-types-data (3.2024.0305)
    mini_portile2 (2.8.5)
    minitest (5.22.3)
    minitest-reporters (1.6.1)
      ansi
      builder
      minitest (>= 5.0)
      ruby-progressbar
    money (6.19.0)
      i18n (>= 0.6.4, <= 2)
    msgpack (1.7.2)
    multi_json (1.15.0)
    mustermann (3.0.0)
      ruby2_keywords (~> 0.0.1)
    mutex_m (0.2.0)
    mysql2 (0.5.5)
    net-http (0.4.1)
      uri
    netrc (0.11.0)
    nio4r (2.7.0)
    nokogiri (1.16.3-x86_64-darwin)
      racc (~> 1.4)
    numerizer (0.1.1)
    oj (3.16.3)
      bigdecimal (>= 3.0)
    oj_mimic_json (1.0.1)
    optimist (3.1.0)
    parallel (1.24.0)
    parser (3.3.0.5)
      ast (~> 2.4.1)
      racc
    pdf-core (0.10.0)
    prawn (2.5.0)
      matrix (~> 0.4)
      pdf-core (~> 0.10.0)
      ttfunk (~> 1.8)
    prawn-qrcode (0.5.2)
      prawn (>= 1)
      rqrcode (>= 1.0.0)
    prawn-table (0.2.2)
      prawn (>= 1.3.0, < 3.0.0)
    prometheus-client (4.2.2)
    pry (0.14.2)
      coderay (~> 1.1)
      method_source (~> 1.0)
    pry-nav (1.0.0)
      pry (>= 0.9.10, < 0.15)
    puma (6.4.2)
      nio4r (~> 2.0)
    raabro (1.4.0)
    racc (1.7.3)
    rack (2.2.8.1)
    rack-parser (0.7.0)
      rack
    rack-protection (3.2.0)
      base64 (>= 0.1.0)
      rack (~> 2.2, >= 2.2.4)
    rack-test (2.1.0)
      rack (>= 1.3)
    racksh (1.0.1)
      rack (>= 1.0)
      rack-test (>= 0.5)
    rails-dom-testing (2.2.0)
      activesupport (>= 5.0.0)
      minitest
      nokogiri (>= 1.6)
    rails-html-sanitizer (1.6.0)
      loofah (~> 2.21)
      nokogiri (~> 1.14)
    rainbow (3.1.1)
    rake (13.1.0)
    rbtrace (0.5.1)
      ffi (>= 1.0.6)
      msgpack (>= 0.4.3)
      optimist (>= 3.0.0)
    rbtree (0.4.6)
    redis (4.8.1)
    regexp_parser (2.9.0)
    rest-client (2.1.0)
      http-accept (>= 1.7.0, < 2.0)
      http-cookie (>= 1.0.2, < 2.0)
      mime-types (>= 1.16, < 4.0)
      netrc (~> 0.8)
    rexml (3.2.6)
    roda (3.78.0)
      rack
    roo (2.10.1)
      nokogiri (~> 1)
      rubyzip (>= 1.3.0, < 3.0.0)
    rqrcode (2.2.0)
      chunky_png (~> 1.0)
      rqrcode_core (~> 1.0)
    rqrcode_core (1.2.0)
    rsolr (2.5.0)
      builder (>= 2.1.2)
      faraday (>= 0.9, < 3, != 2.0.0)
    rubocop (1.62.1)
      json (~> 2.3)
      language_server-protocol (>= 3.17.0)
      parallel (~> 1.10)
      parser (>= 3.3.0.2)
      rainbow (>= 2.2.2, < 4.0)
      regexp_parser (>= 1.8, < 3.0)
      rexml (>= 3.2.5, < 4.0)
      rubocop-ast (>= 1.31.1, < 2.0)
      ruby-progressbar (~> 1.7)
      unicode-display_width (>= 2.4.0, < 3.0)
    rubocop-ast (1.31.2)
      parser (>= 3.3.0.4)
    ruby-hmac (0.4.0)
    ruby-progressbar (1.13.0)
    ruby2_keywords (0.0.5)
    rubyXL (3.4.25)
      nokogiri (>= 1.10.8)
      rubyzip (>= 1.3.0)
    rubyzip (2.3.2)
    rufus-scheduler (3.9.1)
      fugit (~> 1.1, >= 1.1.6)
    sequel (5.78.0)
      bigdecimal
    set (1.1.0)
    sidekiq (6.5.12)
      connection_pool (>= 2.2.5, < 3)
      rack (~> 2.0)
      redis (>= 4.5.0, < 5)
    sidekiq-status (3.0.3)
      chronic_duration
      sidekiq (>= 6.0, < 8)
    sidekiq-unique-jobs (7.1.33)
      brpoplpush-redis_script (> 0.1.1, <= 2.0.0)
      concurrent-ruby (~> 1.0, >= 1.0.5)
      redis (< 5.0)
      sidekiq (>= 5.0, < 7.0)
      thor (>= 0.20, < 3.0)
    sinatra (3.2.0)
      mustermann (~> 3.0)
      rack (~> 2.2, >= 2.2.4)
      rack-protection (= 3.2.0)
      tilt (~> 2.0)
    sinatra-contrib (3.2.0)
      multi_json (>= 0.0.2)
      mustermann (~> 3.0)
      rack-protection (= 3.2.0)
      sinatra (= 3.2.0)
      tilt (~> 2.0)
    snappy (0.4.0)
    sorted_set (1.0.3)
      rbtree
      set (~> 1.0)
    strip_attributes (1.13.0)
      activemodel (>= 3.0, < 8.0)
    sucker_punch (3.2.0)
      concurrent-ruby (~> 1.0)
    sys-proctable (1.3.0)
      ffi (~> 1.1)
    thor (1.3.1)
    tilt (2.1.0)
    timeout (0.4.1)
    trilogy (2.7.0)
    ttfunk (1.8.0)
      bigdecimal (~> 3.1)
    tzinfo (2.0.6)
      concurrent-ruby (~> 1.0)
    unicode-display_width (2.5.0)
    uri (0.13.0)
    waterdrop (2.6.14)
      karafka-core (>= 2.2.3, < 3.0.0)
      zeitwerk (~> 2.3)
    working_hours (1.4.1)
      activesupport (>= 3.2)
      tzinfo
    zeitwerk (2.6.13)

PLATFORMS
  x86_64-darwin-22
  x86_64-darwin-23

DEPENDENCIES
  actionview
  activerecord
  aes
  bunny
  byebug
  citrus
  faraday
  google-maps
  iban-tools
  instana
  jwt
  karafka
  karafka-testing
  karafka-web
  matrix
  minitest
  minitest-reporters
  money
  mysql2 (= 0.5.5)
  nokogiri
  oj
  oj_mimic_json
  prawn
  prawn-qrcode
  prawn-table
  prometheus-client
  pry
  pry-nav
  puma
  rack-parser
  racksh
  rake
  rbtrace
  redis
  rest-client
  roo
  rsolr
  rubocop
  rubyXL
  rufus-scheduler
  sequel
  sidekiq (~> 6.5.8)
  sidekiq-status
  sidekiq-unique-jobs
  sinatra
  sinatra-contrib
  snappy
  strip_attributes
  sucker_punch
  tilt (= 2.1.0)
  trilogy
  working_hours

RUBY VERSION
   ruby 3.3.0p0

BUNDLED WITH
   2.3.4

Ruby Version

3.3.0
ramasangita commented 6 months ago

on_workerboot do || ::Instana.agent.after_fork end

here instead of after_fork, can we use spawn_background_thread?. I see that spawn_background_thread is asynchronous

Ferenc- commented 6 months ago

@ramasangita The point of using a synchronous call for the announce, is to ensure that the tracer captures as many events from early during server startup as possible. While it is unclear why you personally need an asynchronous announce call, (perhaps some readiness probes time out?) the spawn_background_thread is usable but that doesn't quarantee the recording of all the spans and metrics from the startup as the synchronous call does.

ramasangita commented 6 months ago

If I use after_fork, it is blocking the server startup completely. Puma server startup timeouts and application never starts. The current version which I am using is working fine.(1.11.6). after_fork is not blocking the startup in this version

So wondering for non cluster mode if we use spawn_background_thread https://github.com/instana/ruby-sensor/blob/master/lib/instana.rb#L18 can we not use same for cluster mode as well?

Ferenc- commented 6 months ago

If I use after_fork, it is blocking the server startup completely.

Could you also provide some sample for this? Because I have tried this, and I have not seen any problem with the startup.

workers 3                                                                                                                                                                             [0/5100]
preload_app!
on_worker_boot do |_|
::Instana.agent.after_fork if defined?(::Instana)
end

If you observe blocking, that is indicative of a network issue, and ultimately an asynchronous annouce likelyl won't help you much. But as I said feel free to use asynchronous annouce ,if it is OK for you that some events might not be recorded.

ramasangita commented 6 months ago

Found the issue, HostAgentLookup is timing out(sometimes) which is triggering a loop on waiting connection. I think older version is doing it asynchronously so never realised it.

So closing this and will check from my end on timeouts