Closed notbrain closed 9 years ago
Looking at the logs, something is definitely going awry. It seems that PM has lost track of pid 12763, which should be the master process (WID 0) for the 2 workers that actually exist.
There have been a couple patch (and a minor) releases since strong-pm@4.2.0. Are you in a position where you can upgrade to the latest?
As for cleaning up files.. if you want, you really want to, you can manually reset everything with sudo rm -rf /var/lib/strong-pm/*
(notice the /*
at the end, you don't want to delete /var/lib/strong-pm
itself). It's probably best to run that after shutting down PM. When you restart it, it will be a completely fresh slate.
If you are feeling particularly brave, you could "void the warranty" so to speak and try modifying the strong-pm.json
file manually (again, probably best to do this after shutting down PM).
Thanks, will upgrade and wipe everything and start over. Will document slc ctl create <svc>
and see if that works as expected and avoids dupe entries with the same svc name.
Whoa...looking at strong-pm.json it has grown to 50mb overnight, and is constantly growing with what looks like constant log-dump command logs, then every once in a while all the errors associated with the fact that we haven't purchased a license for the extra SL stuff. Is this another logfile type thing or also indicative of something awry?
"4360": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:16.233Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4360}",
"4361": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:17.265Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4361}",
"4362": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:18.295Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4362}",
"4363": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:19.343Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4363}",
"4364": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:20.376Z\",\"result\":{\"log\":\"2015-06-05T00:08:19.626Z pid:7990 worker:0 WARN received SIGHUP, restarting workers\\n2015-06-05T00:08:19.641Z pid:7990 worker:0 ERROR supervisor worker id 1 (pid 8004) expected exit with 2\\n2015-06-05T00:08:19.724Z pid:7990 worker:0 INFO supervisor started worker 3 (pid 8664)\\n2015-06-05T00:08:19.725Z pid:7990 worker:0 INFO supervisor resized to 2\\n2015-06-05T00:08:19.975Z pid:8664 worker:3 INFO strong-agent v1.6.0 profiling app 'proteus' pid '8664'\\n2015-06-05T00:08:19.979Z pid:8664 worker:3 INFO strong-agent[8664] started profiling agent\\n\"},\"serviceInstanceId\":1,\"id\":4364}",
"4365": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:21.411Z\",\"result\":{\"log\":\"2015-06-05T00:08:20.838Z pid:8664 worker:3 ERROR strong-agent error: failed to instrument mysql\\n2015-06-05T00:08:21.269Z pid:8664 worker:3 INFO strong-agent not profiling, agent metrics requires a valid license.\\n2015-06-05T00:08:21.269Z pid:8664 worker:3 Please contact sales@strongloop.com for assistance.\\n2015-06-05T00:08:21.272Z pid:8664 worker:3 Web server listening at: http://0.0.0.0:3003/\\n\"},\"serviceInstanceId\":1,\"id\":4365}",
"4366": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:22.446Z\",\"result\":{\"log\":\"2015-06-05T00:08:21.736Z pid:7990 worker:0 ERROR supervisor worker id 2 (pid 8007) expected exit with 2\\n2015-06-05T00:08:21.813Z pid:7990 worker:0 INFO supervisor started worker 4 (pid 8679)\\n2015-06-05T00:08:21.813Z pid:7990 worker:0 INFO supervisor resized to 2\\n2015-06-05T00:08:22.057Z pid:8679 worker:4 INFO strong-agent v1.6.0 profiling app 'proteus' pid '8679'\\n2015-06-05T00:08:22.061Z pid:8679 worker:4 INFO strong-agent[8679] started profiling agent\\n\"},\"serviceInstanceId\":1,\"id\":4366}",
"4367": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:23.482Z\",\"result\":{\"log\":\"2015-06-05T00:08:22.907Z pid:8679 worker:4 ERROR strong-agent error: failed to instrument mysql\\n2015-06-05T00:08:23.336Z pid:8679 worker:4 INFO strong-agent not profiling, agent metrics requires a valid license.\\n2015-06-05T00:08:23.337Z pid:8679 worker:4 Please contact sales@strongloop.com for assistance.\\n2015-06-05T00:08:23.340Z pid:8679 worker:4 Web server listening at: http://0.0.0.0:3003/\\n\"},\"serviceInstanceId\":1,\"id\":4367}",
"4368": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:24.525Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4368}",
"4369": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:25.551Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4369}",
"4370": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:26.577Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4370}",
"4371": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:27.603Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4371}",
Was able to hard stop everything and stop strong-pm, but tailed strong-pm.json: at the end it shows it was trying to work with Service IDs 2 and 6, when slc ctl status
only shows 1 and 8:
...snip...
"260528": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:38:50.910Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260528}",
"260529": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:38:54.520Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260529}",
"260530": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:38:59.294Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260530}",
"260531": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:39:02.901Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260531}",
"260532": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:39:06.342Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260532}",
"260533": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:39:09.580Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260533}",
"260534": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:39:14.505Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260534}",
"260535": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-18T21:55:12.207Z\",\"result\":{},\"serviceInstanceId\":2,\"id\":260535}",
"260536": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":6,\"instanceId\":6},\"timestamp\":\"2015-06-19T01:23:25.563Z\",\"result\":{},\"serviceInstanceId\":6,\"id\":260536}",
"260537": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":6,\"instanceId\":6},\"timestamp\":\"2015-06-19T01:25:31.521Z\",\"result\":{},\"serviceInstanceId\":6,\"id\":260537}",
"260538": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":6,\"instanceId\":6},\"timestamp\":\"2015-06-19T01:27:49.133Z\",\"result\":{},\"serviceInstanceId\":6,\"id\":260538}",
"260539": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":6,\"instanceId\":6},\"timestamp\":\"2015-06-19T01:33:10.614Z\",\"result\":{},\"serviceInstanceId\":6,\"id\":260539}",
"260540": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-19T01:44:27.085Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":8,\"id\":260540}",
"260541": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-22T22:25:39.518Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":8,\"id\":260541}",
"260542": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-22T22:27:08.649Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":1,\"id\":260542}",
"260543": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-22T22:28:26.205Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":8,\"id\":260543}",
"260544": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-22T22:51:43.332Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":8,\"id\":260544}",
"260545": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-23T17:12:53.740Z\",\"result\":{},\"serviceInstanceId\":8,\"id\":260545}",
"260546": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-23T17:13:26.745Z\",\"result\":{},\"serviceInstanceId\":1,\"id\":260546}"
}
}
}
@notbrain do you have slc ctl log-dump -f
running somewhere?
@kraman @sam-github is the InstanceAction means to be persisted? Seems like it probably shouldn't be..
I didn't have any slc ctl log-dump -f
commands running just now; last week I was attempting to do a bunch of slc ctl
commands while 2nd service's mongo conn was constantly error-ing (waiting for env-set
to snap into correctness) but most of those ops resulted in 90s of waiting and then the "socket hang up" error. Restarting svc ID 8 from that point was never able to recover and run normally. The last time I had log-dump --follow
running was last week. Once I saw strong-pm.log had the same I stopped using log-dump
.
Is there a recommended way to install strong-pm latest standalone alongside strongloop? Since this is a build server I need the full slc command suite to deploy/build.
The docs on upgrading strong-pm say to upgrade strong-pm by itself but then say to use slc to install everything, which ends up creating an upstart file that points to the strongloop/node_modules/.../sl-pm.js dependency that is the previous version, making the previous upgrade command pointless?
Is it as simple as editing the strong-pm.conf file to use /usr/bin/sl-pm instead of the strongloop dependency?
slc-installed sl-pm:
exec /usr/bin/nodejs /usr/lib/node_modules/strongloop/node_modules/strong-pm/bin/sl-pm.js --listen 8701 --base /var/lib/strong-pm --driver direct
desired sl-pm?
exec /usr/bin/nodejs /usr/bin/sl-pm --listen 8701 --base /var/lib/strong-pm --driver direct
3 different pm's floating around (note grep, ignore tree chars):
$ npm ls -g | grep strong-pm
├─┬ strong-pm@4.3.1
│ ├─┬ strong-pm@3.2.0
├─┬ strong-pm@4.2.0
Yes, I think that change should work. If it doesn't, /usr/lib/node_modules/strong-pm/bin/sl-pm.js
would definitely work.
If you are installing all of strongloop
anyway on that server then there's likely no benefit from installing the standalone strong-pm
. You can upgrade the strong-pm
dependency with npm install -g strongloop
(no need to uninstall first).
As an aside: For the most part, the slc XXX
commands are just wrappers for an sl-XXX
binary that is provided by of one of the dependencies. For something like slc build
(strong-build) or slc deploy
(strong-deploy) it can be a significant savings in disk space and bandwidth if those resources are constrained and only the one command is needed.
So far so good after I deleted everything inside /var/lib/strong-pm/, updated to 4.3.1; I was also able to use slc ctl create
and then slc deploy -s svcname
without creating duplicate services.
The main concern left is the fact that strong-pm.json is continuously growing for no understood reason, even when I do not have any ...log-dump --follow
running. Is this file's size managed by strong-pm? I continue to see strong-pm.json fill up with this stuff:
"1016": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:00:56.045Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1016}",
"1017": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:00:57.060Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1017}",
"1018": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:00:58.075Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1018}",
"1019": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:00:59.093Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1019}",
"1020": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:00.108Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1020}",
"1021": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:01.123Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1021}",
"1022": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:02.139Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1022}",
"1023": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:03.157Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1023}",
"1024": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:04.171Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1024}",
"1025": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:05.189Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1025}"
Encountered another strange process list after launching an app failed because of an unset env variable. Figured I'd post here to find out what the recommended way would be to recover from such things. Makes me nervous that processes would just keep spawning so I am still not using strong-pm in production yet.
I have hard stopped the app but still see this for status - what recourse do I have beyond completely removing and re-adding the app to recover and clear all these out? Also if I do not clear out /var/lib/strong-pm/ will the new app use a higher service ID and service ID 2 will forever be around and unusable? Can I just delete /var/lib/strong-pm/svc/2
? instead?
Service ID: 2
Service Name: proteus-staging
Environment variables:
Name Value
...snipped...
Instances:
Version Agent version Cluster size Driver metadata
4.3.1 1.6.1 0 N/A
Processes:
ID PID WID Listening Ports Tracking objects? CPU profiling? Tracing?
2.1.11179 11179 0
2.1.11183 11183 1
2.1.11187 11187 2
2.1.11196 11196 3
...
2.1.23276 23276 1942
2.1.23282 23282 1943
2.1.23288 23288 1944
If there's anything I would suggest to make this a bit less painful - allow a file of env vars to be used instead of requiring they be set via slc ctl env-set
. Is there a way I can do this? Where do the env vars get stored by strong-pm?
slc ctl remove servicename
works to remove the service, but upon a subsequent slc ctl create servicename
, the old service ID is no longer usable? Is this by design or just how things ended up? Short of completely deleting /var/lib/strong-pm/*
is there a way to reuse a service ID once you've removed the app that was assigned to it?
strong-pm.json growing
It is growing a line per process that existed, we don't delete old processes ATM, is your disk really short of space?
dead pids showing up in
sl-meshctl status
This is a bug, I can't reproduce, can you reproduce all the time, or just sometimes?
edit: if you see again, can you gist the strong-pm.json?
env-set painful
Have you considered doing slc ctl -C http://production env-set $(cat my-env.txt)
?
This would be less painful than sshing into that remote host, and hacking up a file somewhere to be my-env.txt, wouldn't it? If not, can you elaborate more on the painfulness?
The internal DB structures of pm aren't about to be exposed, its designed to be remotely configurable.
reusability of record IDs
We could perhaps make it possible to deploy to a service ID, and force a new DB record to be created with a previously used record ID.
Can you consider the IDs to be opaque identifiers? Why do you want to resuse one? Is it because we (ab)use them in order to derive a unique listening port?
In an upcoming release, the service's port will be overrideable from the default.
/to @kraman we may need to accelerate the gc of no longer useful records from the DB. That will involve making sure there are no outstanding references to them, though.
forgot to press the green button
The service id is incremented every time a service is created and that sequence number is tracked inside the "database" (currently a $BASEDIR/strong-pm.json
which backs a loopback memory datastore).
PM itself doesn't support any sort of environment file, but strong-supervisor supports a .env
file (see dotenv) in the app's root. It would require you to either commit your .env
file or at least bundle it with your deployment, which may or may not be acceptable to you.
Another option is setting multiple variables at once. If you have a .env
file like above (without comments), you could use it with slc ctl env-set servicename $(cat .env)
. Still not quite what you're asking for, but may be close enough depending on your environment.
@sam-github
Have you considered doing slc ctl -C http://production env-set $(cat my-env.txt)?
Thanks, will try this, was just working on something similar -- a script that would take svcname and set accordingly. @rmg yes it is not acceptable for us to commit passwords etc, so the script route is fine, $(cat env) is also something to consider.
Can you consider the IDs to be opaque identifiers? Why do you want to reuse one?
Yes, and because OCD and simply wondering where they are kept; so these are the IDs of the entries in the strong-pm private DB? That's fine by me, and being able to set them was an old issue for us since now setting npm_config_port
overrides the 3000+svcID thing. I just needed ports for a manual nginx setup.
Appreciate the help.
@notbrain are you good now? can we close this?
So far so good, will open a new one if anything shows up again.
Glad to hear that.
I had an existing demo.app.com purring along nicely, then worked on getting a separate instance up. I think my main issue is my original
slc ctl create
andslc ctl deploy
command might have corrupted something. As an aside I'm wondering how to find all the files that strong-pm installs and delete them so I can start over from scratch with both services. Is there a manual removal list somewhere in the docs?I am also running StriderCD via upstart on the same instance, which could potentially have some impact.
In any case, I'm seeing strange behavior on the 2nd service. It seems to be running fine, but the list of workers is different than the 1st service I installed. I went through 2-7 service IDs (caused by dupe services using
slc ctl create
before deploying) and the 2nd service is now on SID 8. How can I reset this to get it back to 2? No real need, just OCD and curious where these indexes are stored (especially after I've removed the service completely, it seems like it can't reuse a service ID once removed?).1st service shows up as expected, 2 workers listening on 3003.
The 2nd service on port 3002, on the other hand, shows 3 blank lines and then 4 workers, actually now 2 blank lines and 4 workers. Is this some sort of debug port collision? The only real PIDs are the last 2, but where did the first 4 process list lines come from?
A tally of node processes shows the last two but the first 4 are ghost processes:
And the strong-pm.log for a
slc ctl restart proteus-staging
(2nd service):