Closed lsilvapvt closed 7 years ago
We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.
The labels on this github issue will be updated when the story is started.
I think a combination of adding a Known Issues section to the README and updating wait-for-opsmgr to check the OpsMgr events file would be good.
cc @sadvani
hi @lsilvapvt i believe we will be addressing this issue here: https://github.com/pivotal-cf/pcf-pipelines/pull/177. please let us know if you have any feedback by following this tracker story: https://www.pivotaltracker.com/story/show/150672203. i'll be closing this issue in favour of the aforementioned tracker issue. i'll add a link in that tracker story to this issue so we have a record of it. thanks.
This problem happened in two distinct PCF 1.9.2 environments of a customer that deployed the pcf-pipelines (tested with v0.8, v0.11 and v0.13.2) : the tasks for
Apply-Change
andWait-for-Opsmgr
of theUpgrade-Tile
pipeline both return that a task is already running in OpsMgr even though there is no one started in the Ops Mgr UI. That return code prevents the pipeline from proceeding to the Apply-Changes phase of the upgrade, requiring the customer to use the OpsMgr UI to continue.The root cause: We found out that in the OpsMgr's API "installations" output contained a task a couple of months old that was still in "running" state and with no
finished_at
date (see example below). That entry caused the tasks mentioned above to incorrectly return that a task is already running (even the apply-changes command of theom
tool v0.23 fails because of it) because their methods simply check for the existence of an entry with "running" state. According to the customer, what seems to have caused that situation was the reboot of the OpsMgr VM after the corresponding running action got stuck. Apparently OpsMgr left that entry unchanged in itsinstalls
table after the reboot and never updated it tofailed
state.Potential solutions: A) Update both
wait-for-opsmgr
task.sh and theom
tool to parse the recent OpsMgr events json file instead of just searching for an entry with "running" status; ORB) Provide a Known-Issues readme in the pcf-pipelines package describing the issue above and the workaround below to fix those event entries in the OpsMgr DB: 1) Make a backup copy of OpsMgr settings (add link to docs) 2) SSH to the OpsMgr VM and become root (
sudo su -
) 3) Switch to postgres user (sudo su postgres
) 4) Execute commandpsql
(no password required) 5) Connect to the DB: