Open asgrim opened 2 years ago
Note: this is definitely an older version of scout-apm-php
since the scout-apm-laravel
code exists. Not sure which version is being used though.
Further info; changing the env var may be the cause if this, depending on how Heroku behaves when this happens; perhaps it kills the running process, but leaves the /tmp
path in tact?
Hmm, did some investigation this afternoon, I'm not sure changing the env var is the cause here; I tried this in Heroku and it did fully restart the node:
2022-03-02T14:51:58.871977+00:00 heroku[worker.1]: Stopping all processes with SIGTERM
2022-03-02T14:51:59.303098+00:00 heroku[worker.1]: Process exited with status 143
2022-03-02T14:51:59.443577+00:00 heroku[worker.1]: Starting process with command `php artisan queue:work`
2022-03-02T14:51:59.290285+00:00 app[web.1]: SIGTERM received, attempting graceful shutdown...
2022-03-02T14:51:59.291008+00:00 app[web.1]: Stopping httpd gracefully...
2022-03-02T14:51:58.895539+00:00 heroku[web.1]: Stopping all processes with SIGTERM
2022-03-02T14:52:00.121932+00:00 heroku[worker.1]: State changed from starting to up
2022-03-02T14:52:00.156345+00:00 heroku[web.1]: Starting process with command `vendor/bin/heroku-php-apache2 public/`
2022-03-02T14:52:00.350309+00:00 app[web.1]: Stopping php-fpm gracefully...
2022-03-02T14:52:00.471076+00:00 app[web.1]: Shutdown complete.
2022-03-02T14:52:00.662258+00:00 heroku[web.1]: Process exited with status 143
2022-03-02T14:51:57.638808+00:00 heroku[web.1]: Restarting
2022-03-02T14:51:57.679338+00:00 heroku[web.1]: State changed from up to starting
2022-03-02T14:51:57.713450+00:00 heroku[worker.1]: Restarting
2022-03-02T14:51:57.737295+00:00 heroku[worker.1]: State changed from up to starting
2022-03-02T14:52:01.078276+00:00 app[worker.1]: [2022-03-02 14:52:01] local.DEBUG: [Scout] Laravel Scout Agent is starting
2022-03-02T14:52:01.078289+00:00 app[worker.1]:
Perhaps a general "crash" of the core-agent may reproduce this, but I need to simulate the process going away. Will try that next.
Hmm, still unable to reproduce this. On my test Heroku instance I added an HTTP endpoint that kills the running core-agent
(on the web node, not an ephemeral CLI node!), but even with the core-agent
simply killed, leaving the /tmp
path in tact, the agent detects the manifest and sees that everything is fine, and launches the Core Agent:
[2022-03-08 09:55:06] local.DEBUG: [Scout] Laravel Scout Agent is starting
[2022-03-08 09:55:06] local.DEBUG: [Scout] Configuration: {"monitor":true,"name":"Laravel Blog Heroku","key":"<redacted>","log_level":"DEBUG","log_payload_content":false,"api_version":"1.0","ignore":[],"application_root":null,"scm_subdirectory":null,"revision_sha":null,"hostname":null,"disabled_instruments":null,"core_agent_log_level":"DEBUG","core_agent_log_file":null,"core_agent_config_file":null,"core_agent_socket_path":"tcp:\/\/127.0.0.1:6590","core_agent_dir":"\/tmp\/scout_apm_core","core_agent_full_name":"scout_apm_core-v1.4.0-x86_64-unknown-linux-musl","core_agent_download_url":"https:\/\/s3-us-west-1.amazonaws.com\/scout-public-downloads\/apm_core_agent\/release","core_agent_launch":true,"core_agent_download":true,"core_agent_version":"v1.4.0","core_agent_triple":"x86_64-unknown-linux-musl","core_agent_permissions":511,"uri_reporting":"path","uri_filtered_params":["access","access_token","api_key","apikey","auth","auth_token","card","certificate","credentials","crypt","key","mysql_pwd","otp","passwd","password","private","protected","salt","secret","ssn","stripetoken","token"],"errors_enabled":false,"errors_ignored_exceptions":[],"errors_host":"https:\/\/errors.scoutapm.com","errors_batch_size":5,"errors_filtered_params":["password","s3-key","scout_key"]}
[2022-03-08 09:55:06] local.INFO: [Scout] Scout Core Agent (app=Laravel Blog Heroku, ext=n/a) not connected yet, attempting to start
[2022-03-08 09:55:06] local.INFO: [Scout] Parsing Core Agent Manifest at "/tmp/scout_apm_core/scout_apm_core-v1.4.0-x86_64-unknown-linux-musl/manifest.json"
[2022-03-08 09:55:06] local.INFO: [Scout] Parsing Core Agent Manifest at "/tmp/scout_apm_core/scout_apm_core-v1.4.0-x86_64-unknown-linux-musl/manifest.json"
[2022-03-08 09:55:06] local.DEBUG: [Scout] exec is available
[2022-03-08 09:55:06] local.DEBUG: [Scout] Core Agent Launch in Progress
[2022-03-08 09:55:06] local.DEBUG: [Scout] Launching core agent with command: '/tmp/scout_apm_core/scout_apm_core-v1.4.0-x86_64-unknown-linux-musl/core-agent' 'start' '--daemonize' 'true' '--log-file' '/dev/null' '--log-level' 'DEBUG' '--tcp' '127.0.0.1:6590'
[2022-03-08 09:55:06] local.WARNING: [Scout] Failed to connect to socket on address "tcp://127.0.0.1:6590", previous message: socket_connect(): unable to connect [111]: Connection refused
[2022-03-08 09:55:06] local.DEBUG: [Scout] Handle MiddlewareInstrument
[2022-03-08 09:55:06] local.DEBUG: [Scout] Handle ActionInstrument
[2022-03-08 09:55:09] local.DEBUG: [Scout] Connected to connector whilst sending.
[2022-03-08 09:55:09] local.DEBUG: [Scout] Skipping metadata send, already sent
[2022-03-08 09:55:09] local.DEBUG: [Scout] Sending metrics from 3 collected spans.
[2022-03-08 09:55:09] local.DEBUG: [Scout] Sent whole payload successfully to core agent.
[2022-03-08 09:55:09] local.DEBUG: [Scout] SendRequestToScout succeeded
Note: the WARNING
about failed to connect is expected, as it takes a moment for core-agent
to launch.
Not sure how else to reproduce this issue currently.
Looks like there is a bug here where if the path already exists, the agent download process crashes. I would've thought we'd already handled existing directory, but investigate further - perhaps permissions means we can't overwrite or something?