Closed antonmorant closed 1 year ago
Hi @antonmorant, thanks for your detailed bug report and sleuthing thus far. I'm glad to hear downgrading your agent stops the error in your environment.
I'm taking a look at this issue and will get back to you soon with more details.
Hi @antonmorant, I followed your repro steps using an environment matching your list and couldn't reproduce the bug. Unfortunately, I think the problem may be more specific to your environment.
We use require 'fiber'
to bring in the first-party Ruby Fiber
library, as it isn't automatically made available in all Ruby versions and implementations. Our current hypothesis is that while the require
is erroring out for you, you still actually have access to the Fiber
library in your application.
To test that theory, please update the newrelic_rpm
line in your Gemfile to the following:
gem 'newrelic_rpm', git: https://github.com/newrelic/newrelic-ruby-agent, branch: 'bugfix/fiber-load-error'
If that theory is wrong and your application really doesn't have access to the Fiber
class, then we will see a different error to that effect.
Thank you for looking into this! I'm sorry for the long turnaround. I'll give this a shot and report back, you may expect to hear from me next week.
Hi @kaylareopelle, I have done a deploy with the newrelic gem pinned to bugfix/fiber-load-error
as you requested and I'm afraid the result was the same. I was going to paste the logs here, but I checked and the error message and stack trace are actually identical, line by line.
Hi @antonmorant! I work with @kaylareopelle and took a closer look at this one.
Here is what I currently understand:
Fiber
class is automatically available without the need for a require "fiber"
call. So defined?(Fiber)
will work for you out of the box without the require "fiber"
call.require "fiber"
for compatibility with other versions of Ruby that need to have that call performed in order to know about the Fiber
class. For Rubies that don't need that require "fiber"
call, the require
call should simply return false
(signifying that the library was already loaded) and not cause any problems.require "fiber"
call is performed. We've never encountered this before. We spun up a Rails app with your exact tech stack versions and deployed to Heroku as you are doing and we are able to call require "fiber"
without issue.require "fiber"
call with a rescue
and log a new "Failed to require" log message (WARN level) to log/newrelic_agent.log
. That's all it does. Here's a diff view of what the bugfix branch provides. If you happen to see that new log message in your log/newrelic_agent.log
file, the message may possibly include some helpful information.require "fiber"
call to the latest version of the agent that does perform that call led us to create the bugfix branch that rescues that require
call. But upon inspecting your stack trace... I can't actually see any references to New Relic code. I see Bundler, RubyGems, Puma, Rails, and Sentry, but no mention of New Relic.streaming_template_renderer.rb
file performs a require "fiber"
call here on line 3. From looking at your stack trace, that line 3 is present and ultimately responsible for the error.require "fiber"
and for some as-yet-unexplained reason your app / tech stack is unable to perform that call without erroring out.At this point with the New Relic code itself not appearing in the stack trace, I don't have confidence that further changes made in a bugfix branch will actually be involved at the time of the error and have an impact on its behavior. I think the focus should be on determining why your app / tech stack errors out when encountering a require "fiber"
call.
Are you able to obtain an interactive Irb session with your tech stack or perhaps a worker job that simply performs require "fiber"
and nothing else? If the require
call succeeds in one of those simple contexts but fails when your Rails app boots, then we at least have a finite list of differences to go by in troubleshooting the problem.
If you can produce a reproduction that demonstrates the problem and share it with us via a GitHub gist or repository, we would be happy to try to help troubleshoot. Please do not share any proprietary business logic or sensitive data.
I hope this gives you some ideas on next steps to try. We're happy to answer any questions you may have or being a sounding board for any ideas you'd like to share.
Hi @fallwith,
Thanks for looking into this and offering all the extra info. Very educational!
Everything you said above checks out and/or matches my experience and understanding of the issue. In particular, let me zoom into this one bit:
But upon inspecting your stack trace... I can't actually see any references to New Relic code. I see Bundler, RubyGems, Puma, Rails, and Sentry, but no mention of New Relic.
That is correct. The only data I have pointing to an issue with the newrelic gem is that 1) removing it from the Gemfile removes the error, and adding it back in resurfaces it again; and 2) switching versions on the newrelic gem also removes/resurfaces the error. I wonder if there is some undesired interaction where depending on the load order of libraries it results in an error or not.
Regarding your requests/suggestions:
Are you able to obtain an interactive Irb session with your tech stack or perhaps a worker job that simply performs require "fiber" and nothing else? If the require call succeeds in one of those simple contexts but fails when your Rails app boots, then we at least have a finite list of differences to go by in troubleshooting the problem.
Yes, I can log into the heroku dyno. I have tested both logging into a rails console as well as opening a shell session and then starting irb
. In both cases, the instruction require "fiber"
succeeds without an error and returns false
.
If you can produce a reproduction that demonstrates the problem and share it with us via a GitHub gist or repository, we would be happy to try to help troubleshoot. Please do not share any proprietary business logic or sensitive data.
That's a great idea. It may take me a bit of time to put it together but I might be able to give it a shot.
If you happen to see that new log message in your log/newrelic_agent.log file, the message may possibly include some helpful information.
I didn't realize to look for that file when I was testing the bugfix branch, and because I have redeployed since it seems the file is nowhere to be found on the server machine. But it's good to know.
Hi @antonmorant. I work with @kaylareopelle and @fallwith - we're all keeping tabs on this issue.
While the stack trace points to the error when the Rails code is reached, we agree there seems to be an unintended interaction between the newrelic and rails gems. We can't dig in much further without a reproduction (I've been unable to produce one yet), but I'd love to explore your theory about load order.
We have the configuration option to defer agent initialization until after rails initializers are run. Are you able to set the following as an environment variable using the same environment you're seeing the issue in?
NEW_RELIC_DEFER_RAILS_INITIALIZATION=true
We also want to confirm that when you tested with a Rails console and with irb that you were using the same problematic Gemfile that has v9 of our newrelic_rpm gem and Rails both present. Meaning, did you run bundle exec irb
and bundle exec rails c
or otherwise perform manual require
statements of the New Relic and Rails gems to test?
Hi @antonmorant, we haven't heard from you for a while so we're going to close this issue. Feel free to reopen when you have more information to share.
Description
I recently did a major upgrade on a rails app that is hosted with heroku and uses the NewRelic. This included upgrading the heroku stack, ruby version, rails version, and many gem & code updates to get everything back to working as usual (see environment details further down). When deploying, it updated to the latest version of the
newrelic_rpm
gem (there's no version requirement in Gemfile). This triggered an error that crashes the app on load:LoadError: cannot load such file -- fiber
.The app has 2 processes, a web server (using puma) and a worker for background jobs (using sidekiq). Both processes crash on startup.
After debugging and trying different things, I finally linked the crash to the
newrelic_rpm
gem update. I noticed in the changelog that v9.0.0 added support for Fiber instrumentation, so I tried downgrading the gem. Pegging the version to< 9.0
installs version 8.16.0, which fixes the crash (or, rather, works around it).Expected Behavior
Rails app loads without crashing using
newrelic_rpm
v9.2.2.Troubleshooting or NR Diag results
Crash stack trace:
Steps to Reproduce
This issue might be specific to my app environment/config/code, so I'll just list what seems relevant noting that I have not attempted to reproduce this on a fresh new app:
newrelic_rpm
gem with v9.2.2Your Environment
Additional context
https://stackoverflow.com/questions/76201816/rails-deploy-fails-with-unable-to-load-application-loaderror-cannot-load-such/76201987
For Maintainers Only or Hero Triaging this bug
Suggested Priority (P1,P2,P3,P4,P5): Suggested T-Shirt size (S, M, L, XL, Unknown):