There is a time PODM tells me computer systems are in "InTest" state, and I am unable to do node composition, so I lookup the state from "pod-manager-user-guide-v2-1.pdf", and then execute the below command,
But now I cannot get anything computer systems back,
$ curl -k -u admin:admin https://10.3.0.1:8443/redfish/v1/Systems
{
"@odata.context" : "/redfish/v1/$metadata#Systems",
"@odata.id" : "/redfish/v1/Systems",
"@odata.type" : "#ComputerSystemCollection.ComputerSystemCollection",
"Name" : "Computer System Collection",
"Description" : "Computer System Collection",
"Members@odata.count" : 0,
"Members" : [ ]
}
/var/log/pod-manager/pod-manager-application.log give some hint on such abnormal behavior,
...
WARN c.i.p.d.external.DiscoveryRunner - Connection error while getting data from ExternalService {UUID=4c4c4544-434d-1001-8000-d0946609a764, baseUri=http://10.3.2.248:80/redfish/v1, type=PSME, unreachableSince=2018-10-17T01:27:45.322} service - performing check on this service
2018-10-17 02:22:41,120 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-5] DEBUG c.i.p.d.e.ExternalServiceAvailabilityCheckerTask - Verifying service with UUID 4c4c4544-434d-1001-8000-d0946609a764
2018-10-17 02:22:41,783 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-5] DEBUG c.i.p.d.e.ExternalServiceAvailabilityCheckerTask - Service ExternalService {UUID=4c4c4544-434d-1001-8000-d0946609a764, baseUri=http://10.3.2.248:80/redfish/v1, type=PSME, unreachableSince=2018-10-17T01:27:45.322} still exists
...
But the network and PSME service is good, I can connect and get the system back when I call it directly,
$ curl http://10.3.2.248:80/redfish/v1/Systems
{
"@odata.context": "/redfish/v1/$metadata#Systems",
"@odata.id": "/redfish/v1/Systems",
"@odata.type": "#ComputerSystemCollection.ComputerSystemCollection",
"Name": "Computer System Collection",
"Members@odata.count": 5,
"Members": [
{
"@odata.id": "/redfish/v1/Systems/Rack1-Block2-Sled2-Node1"
},
{
"@odata.id": "/redfish/v1/Systems/Rack1-Block2-Sled4-Node1"
},
{
"@odata.id": "/redfish/v1/Systems/Rack1-Block3-Sled1-Node1"
},
{
"@odata.id": "/redfish/v1/Systems/Rack1-Block3-Sled2-Node1"
},
{
"@odata.id": "/redfish/v1/Systems/Rack1-Block3-Sled3-Node1"
}
]
}
I found there is a similar issue here: https://github.com/intel/intelRSD/issues/58, and looks like this is related with service UUID, how can purge all those data and poll everything again? Is there any configuration item I need to update to fix the issue? what's the root cause for this issue?
Dear developers,
There is a time PODM tells me computer systems are in "InTest" state, and I am unable to do node composition, so I lookup the state from "pod-manager-user-guide-v2-1.pdf", and then execute the below command,
$ sudo /usr/bin/pod-manager-clean-database-on-next-startup
and then restart pod-manager service,
$ sudo systemctl restart pod-manager
But now I cannot get anything computer systems back, $ curl -k -u admin:admin https://10.3.0.1:8443/redfish/v1/Systems { "@odata.context" : "/redfish/v1/$metadata#Systems", "@odata.id" : "/redfish/v1/Systems", "@odata.type" : "#ComputerSystemCollection.ComputerSystemCollection", "Name" : "Computer System Collection", "Description" : "Computer System Collection", "Members@odata.count" : 0, "Members" : [ ] }
/var/log/pod-manager/pod-manager-application.log give some hint on such abnormal behavior, ... WARN c.i.p.d.external.DiscoveryRunner - Connection error while getting data from ExternalService {UUID=4c4c4544-434d-1001-8000-d0946609a764, baseUri=http://10.3.2.248:80/redfish/v1, type=PSME, unreachableSince=2018-10-17T01:27:45.322} service - performing check on this service 2018-10-17 02:22:41,120 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-5] DEBUG c.i.p.d.e.ExternalServiceAvailabilityCheckerTask - Verifying service with UUID 4c4c4544-434d-1001-8000-d0946609a764 2018-10-17 02:22:41,783 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-5] DEBUG c.i.p.d.e.ExternalServiceAvailabilityCheckerTask - Service ExternalService {UUID=4c4c4544-434d-1001-8000-d0946609a764, baseUri=http://10.3.2.248:80/redfish/v1, type=PSME, unreachableSince=2018-10-17T01:27:45.322} still exists ...
But the network and PSME service is good, I can connect and get the system back when I call it directly, $ curl http://10.3.2.248:80/redfish/v1/Systems { "@odata.context": "/redfish/v1/$metadata#Systems", "@odata.id": "/redfish/v1/Systems", "@odata.type": "#ComputerSystemCollection.ComputerSystemCollection", "Name": "Computer System Collection", "Members@odata.count": 5, "Members": [ { "@odata.id": "/redfish/v1/Systems/Rack1-Block2-Sled2-Node1" }, { "@odata.id": "/redfish/v1/Systems/Rack1-Block2-Sled4-Node1" }, { "@odata.id": "/redfish/v1/Systems/Rack1-Block3-Sled1-Node1" }, { "@odata.id": "/redfish/v1/Systems/Rack1-Block3-Sled2-Node1" }, { "@odata.id": "/redfish/v1/Systems/Rack1-Block3-Sled3-Node1" } ] }
I found there is a similar issue here: https://github.com/intel/intelRSD/issues/58, and looks like this is related with service UUID, how can purge all those data and poll everything again? Is there any configuration item I need to update to fix the issue? what's the root cause for this issue?
Thanks a lot for any input!
pod-manager-application.log