intel / intelRSD

Intel® Rack Scale Design Reference Software
http://intel.com/IntelRSD
101 stars 55 forks source link

PODM is not able to list the resources in the PODM APIs while PSME APIs are listing them #65

Closed aneeeshp closed 6 years ago

aneeeshp commented 6 years ago

When running pod-manager in a VM and the psme-rest-server and psme-compute-simulator in another VM, I am getting some error in the PODM. The pod-manager-application.log says:

2018-07-04 14:40:36,105 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-3] ERROR c.i.p.d.external.DiscoveryRunner - Error while polling data from ExternalService {UUID=e3e9aed2-7f65-11e8-a7f0-d788fc73eca4, baseUri=https://10.3.0.246:8443/redfish/v1, type=PSME}

Maybe due to this, PODM is not able to list the resources in the PODM APIs. I could see the systems, managers, chassis information populated from the simulator in the PSME APIs, however, it is not getting listed in the PODM APIs. There is no RMM running.

PODM is able to detect PSME and query the resources in PSME (I could see the URIs being accessed in the PSME debug output). Looks like there is some issue when processing those responses and updating the DB at the PODM side.

Attached the logs from PODM and the debug logs from psme-rest-server.

pod_manager_health_check_2018-07-04-14-45-38-212.tar.gz

psem-rest-server-logs.txt

RobertCMa commented 6 years ago

Hello @aneeeshp ,

To understand the situation, it's good to query the response from wget or curl like the command below:

1/ It's suggested to send the command from the system of POD-Manager to the PSME. Put the ca file/path to the command and dump the result for more ideas.

2/ Execute the command to clean the database: pod-manager-clean-database-immediately

Restart the POD-Manager service and check again if the response data is correctly recorded.

wget/curl Commands:

sudo wget --no-check-certificate --certificate=/path/ca.file -qO- https://10.3.0.246:8443/redfish/v1 | python -m json.tool

sudo curl -vv -k -u admin:admin -H "Content-Type:application/json" -X GET https://127.0.0.1:8443/redfish/1/

aneeeshp commented 6 years ago

Thanks @RobertCMa . I tried the steps that you suggested.

I have cleared DB and restarted PODM, but the issue still happens.

Queried PSME from the PODM system using wget, and I am able to see the response. In the URL, https://10.3.0.246:8443/redfish/v1/Systems/1, I could see the system simulated by the psme-compute-simulator. But when queried the PODM APIs, it is showing 0 members in Systems (https://127.0.0.1:8443/redfish/v1/Systems/)

Attached the output from both the APIs.

query-psme-from-podm.txt query-podm-from-podm.txt

Attached PODM logs as well as psme debug output. pod_manager_health_check_2018-07-05-12-36-14-372.tar.gz psem-rest-server-logs.txt

RobertCMa commented 6 years ago

Hi @aneeeshp,

Found that the PODM is using ver 2.3 and PSME is adopting ver 2.2. I have tried this combination in my setup and no problem for PODM to present the System info. To clarify the situation, there are two items suggested to check in your environment first.

1/ The Fatal log confused me if there is account access permission issue in PostgreSQL service. 2018-07-04 12:26:56.374 IST [401] administrator@administrator LOG: provided user name (administrator) and authenticated user name (podm) do not match 2018-07-04 12:26:56.375 IST [401] administrator@administrator FATAL: Peer authentication failed for user "administrator"

2/ Modify the security.json to set "false" in "ServerCertificateVerificationEnabled". Clean the database and restart PODM service after the modification is suggested for the clean boot. { "ClientKeystorePath": "/var/lib/pod-manager/client.jks", "ServerKeystorePath": "/var/lib/pod-manager/server.jks", "ServerCertificateVerificationEnabled": true }

aneeeshp commented 6 years ago

Hi @RobertCMa The authentication errors in postgresql log are logged when I manually tried to log in to the database using 'administrator' as username and 'podm' as the password. I tried this because the PODM user guide mentions these as the default password for the DB. But it did not work.

The second step also did not help. The error is still happening when running the psme-compute-simulator.

Then I tried running psme-nvme instead of psme-compute-simulator. I have an NVMe drive connected and exposed it through an NVMe-OF kernel mode target. I ran psme-nvme and psme-rest-server. This time I could see the resources exposed by PSME APIs listed in PODM APIs. Attached PODM API output below. podm-working-with-psme-nvme.txt

So the issue seems to be only with psme-compute-simulator. Have you tried with psme-compute-simulator in your setup?

RobertCMa commented 6 years ago

Hi @aneeeshp,

Since RSD 2.2, PSME is able to provide information about the resources exposed by SMBIOS so that Deep Discovery features are disabled by default and no more tested/validated. The functionality still exists in the code base for user reference only.

From the logs of "podm-working-with-psme-nvme.txt" and "query-psme-from-podm.txt", it can see the different system type resource, which should come from different compute agent. This makes more confusion as DeepDiscovery should be gathered by LUI. In the meanwhile, "psme-nvme" is used for the NVMe Agent. It's needed to understand the configuration/setup for the environment for more clarification.

1/ What's the expected configuration in this PSME system?

2/ With PSME Compute Agent "psme-compute" (not Compute Simulator) and REST server, can the system expose the resource to PODM?

3/ It can run psme-healthcheck in PSME system to gather the configuration information.

aneeeshp commented 6 years ago

Hi @RobertCMa,

My intention is to run the PSME in an NVMe-OF target and want the NVMf subsystems to be exposed to PODM so that an orchestration software such as Openstack will be able to see/access it. I just ran psme-compute before really setting up the psme-nvme in order to verify the connectivity between PSME and PODM. I thought that psme-compute-simulator should work and if PODM is able to detect the resources exposed by it, then that confirms that the setup is working. My plan was to then move to psme-nvme.

But later I setup psme-nvme and it is working as expected. I am able to run it in the NVMe-OF target and see the subsystem details in PODM APIs.

Hence this issue can be closed. I have some other queries regarding the psme-nvme and psme-nvme-discovery which I will ask in a different issue.

Thank you for your support.