intel / intelRSD

Intel® Rack Scale Design Reference Software
http://intel.com/IntelRSD
101 stars 55 forks source link

Instalation help #31

Closed ffdarkpenguin closed 6 years ago

ffdarkpenguin commented 6 years ago

Hi guys, I'm trying to setup PSME to talk to our project without success. I've managed to compile and start the rest server and the compute-agent(that's all we are testing for now). In the compute-agent config file, in section"managers" we put the credentials for our redfish server that is responding for our servers. Both servers (rest-server and compute-agent) starts fine, compute-agent manages to register with the rest-server. The rest server keeps pooling something. The rest-server responds fine to /redfish/v1 (service root) request. When I try to get to managers (/redfish/v1/Managers) or systems(/redfish/v1/Systems) I get a reply with zero info, no managers, no systems found. While doing these requests I see nothing on the compute-agent logs. And since the service is started not a single byte was sent to our redfish server. I can't find anyway to debug or understand what's going on. Documentation available just tell me how to compile it. The configuration part is quite problematic. For example: in the rest-configuration you have a table of properties on pag 31, a sample configuration starting on pag. 32, and the configuration schema starting on pag 33. The kind of problem I refer to is: properties: "certs-directory", "thread-pool-size", "client-cert-required" of connectors section that appear on the sample code (pag 32) does not appear on the table of contents nor in the schema. Another example is the configuration sample available at /application/config/psme-rest-server-configuration.json. This was the sample file I used to create my config. It does not have section connectors at all(and this is required at the schema and I see the server complaining about it missing in the start up) and have a property URL that don't show up in the table nor in the schema. So I'm kind of lost here. Trying to be more accurate with my problem: What I expect: When I access /redfish/v1/Systems of the PSME server I would see my available hardware in a redfish response. How I think this works:

{
    "server": {
        "network-interface-name" : "enp0s3",
    "connectors":[
        {
            "use-ssl": true,
            "certs-directory": "/etc/psme/certs",
            "port": 8443,
            "thread-mode": "select",
            "thread-pool-size": 10,
            "client-cert-required": false
        },
        {
            "use-ssl": false,
            "port": 8888,
            "redirect-port": 8443,
            "thread-mode": "select"
        }
    ]
    },
    "registration": {
        "port": 8383,
        "minDelay": 3
    },
    "commands": {
        "generic": "Registration"
    },
    "rmm-present": false,
    "eventing" : {
        "enabled" : false,
        "address": "localhost",
        "port" : 5567,
        "poll-interval-sec" : 20
    },
    "rest-server" : {
        "storage-service-mode" : false
    },
    "service-uuid-file" : "/etc/psme/service_uuid.json",
    "subscription-config-file" : "/tmp/subscriptions",
    "logger" : {
        "app" : {
            "level" : "INFO",
            "timeformat" : "DATE_NS",
            "color" : true,
            "output" : true,
            "tagging" : true,
            "moredebug" : false,
            "streams" : [
                 {
                    "type" : "STDOUT"
                 }
            ]
        }
    }
}

and here is my compute-agent config file:

{
    "agent": {
        "vendor" : "Test",
        "capabilities" : [ "Compute" ]
    },
    "server": {
        "port": 7777
    },
    "registration": {
        "ipv4": "localhost",
        "port": 8383,
        "interval": 3
    },
    "managers": [
        {
            "slot" : 1,
            "switchPortIdentifier" : "sw0p37",
            "ipv4": "192.168.200.30",
            "username": "CBB6DE08B54A14F756B7A62BBCC51005",
            "password": "21BB36EAA74E097B",
            "port": 5000,
            "serialConsoleEnabled": true
        }
    ],
    "service-uuid-file" : "/var/opt/psme/compute-service-uuid.json",
    "logger" : {
        "agent" : {
            "level" : "INFO",
            "timeformat" : "DATE_NS",
            "color" : true,
            "output" : true,
            "tagging" : true,
            "moredebug" : false,
            "streams" : [
                {
                    "type" : "STDOUT"
                }
            ]
        }
    }
}

Here something in the logs of the computer-agent that makes me wonder if there's something wrong. See the "DISCOVERY_MISSING" in the end... 2017-12-04 13:43:39.078658890 - INFO - compute-agent - MANUAL PSME BUILD; Built 18:44:11, 30-11-2017 2017-12-04 13:43:39.078836732 - DEBUG - configuration - Added file /etc/psme/compute-configuration.json 2017-12-04 13:43:39.078985790 - WARN - configuration - Cannot load default file 2017-12-04 13:43:39.079025838 - INFO - configuration - Load file 2017-12-04 13:43:39.079077302 - INFO - configuration - Loaded file /etc/psme/compute-configuration.json 2017-12-04 13:43:39.079117279 - INFO - configuration - Load internal defaults 2017-12-04 13:43:39.079185243 - INFO - compute-agent - JSON Schema load! 2017-12-04 13:43:39.089642978 - INFO - compute-agent - Running SDV PSME Compute Agent. 2017-12-04 13:43:39.089814363 - INFO - eventing - Starting EventDispatcher thread... 2017-12-04 13:43:39.091596527 - INFO - default - Service UUID: dce65a84-d8fa-11e7-af8d-c7ed297637db 2017-12-04 13:43:39.096350941 - INFO - registration - Agent has been registered to http://localhost:8383 2017-12-04 13:43:39.096484570 - INFO - eventing - Sending AMC notifications enabled. 2017-12-04 13:43:42.098815905 - INFO - state-machine - Starting State Machine thread... 2017-12-04 13:43:43.099323281 - INFO - state-machine - Starting State Machine Module thread... 2017-12-04 13:43:53.099958156 - INFO - agent - Module e5c37dc6-d909-11e7-9542-43ae2d42f28c status changed to ABSENT after event DISCOVERY_MISSING

stgrzeszczak commented 6 years ago

Hi, @ffdarkpenguin.

  1. The psme-compute agent reads its data from BMCs over IPMI protocol, not Redfish. The entries in "managers" section in its configuration should contain the connection data pointing to the BMCs of each managed sled. In your case, the agent wasn't able to connect, so it determined that the BMC is absent.

  2. In Intel RSD solution, Redfish protocol is used in communication between psme-rest-server and PODM (northbound interface of PSME), and on the northbound interface of PODM.

  3. The psme-compute agent was able to connect to the psme-rest-server (the log line for this is 2017-12-04 13:43:39.096350941 - INFO - registration - Agent has been registered to http://localhost:8383 there should be a matching entry in the psme-rest-server's logs: INFO - registration - Agent <UUID> registered)

  4. If the connected agent discovers any resources, the psme-rest-server polls them and caches them. So, sending GET requests to the psme-rest-server does not automatically trigger communication with the psme-compute agent.

  5. Unfortunately, the configuration samples in /application/config/ are out of date and should be updated. Please use the /application/configuration.json and the schema in /application/configuration_schema.json sample and modify the config according to the PSME User Guide.

  6. When you get /redfish/v1/Managers, do you get an empty collection ("Members" : []), or do you get a ResourceAtUriUnauthorized error? In the first case, your certificates and "connectors" configuration section are properly set up. In the second case, we'll need to get back to you with more feedback, so please let us know if that is the case.

Regards, Stanislaw.

ffdarkpenguin commented 6 years ago

Hi Stanislaw, your answers cleared all my doubts. Regarding item 6 in your list, I'm getting and empty collection, but you already explained why: I serve redfish protocol and not IPMI. So it's pretty clear they will never talk to each other. Thanks a LOT!!! Regarding the documentation I'd like to make a few suggestion no the user guide. That would be most valuable to users. Please describe how the system works and how the agents works. Stating that the agents only talks to IPMI would save me a weak reading, compiling, and trying to build the system. A easy to maintain, easy to do, easy to understand would be a UML activity diagram on how all components works. something like: agents -IPMI-> hardware agents -Redfish/Whatever-> PSME-REST PSME-REST -Redfish-> PODM

An overview like I gave me in a few lines above in a diagram format would be great for users. Easy place to build that: http://plantuml.com/activity-diagram-beta

This is just a suggestion. My problems are all solved because I need a tool that talks redfish to my server, IPMI wont solve my problem. So again, thank a lot!! Feel free to close this issue if you are not considering improving the docs.

Best regards,

Flávio

stgrzeszczak commented 6 years ago

Hi, Flávio!

I've just skimmed through the documents and it seems that you are right. The Intel RSD Architecture Specification contains management hierarchy illustrations which are too high level and barely mention IPMI (or Redfish, for that matter) by name. On the other hand the PSME User Guide is "too close" to the subject and does not contain such general infographics. This needs some thought.

Nevertheless, if you're interested in a software component which will consume your Redfish API, then maybe you should take a look at the PODM component of Intel RSD. In the RSD solution it communicates with multiple psme-rest-server instances over Redfish API, aggregates and manages all resources, and exposes a Redfish interface on its own. Depending on what you're trying to do, it may be the component that you'd want to use.

Best regards, Stanislaw

ffdarkpenguin commented 6 years ago

Hi Stanislaw, My goal is try to use IntelRSD with our tool. Our tool expose an enclosure hardware via redfish protocol. Please feel free to take a look at our project: https://github.com/HewlettPackard/oneview-redfish-toolkit Our tool talks to HPE OneView server, gets the info and reply it as Redfish info.

I was about to ask you about PODM in the previous message but deleted the question. It's because I read the PODM user guide and I saw that it uses DCHP and SSDP to look for PSME, RMM, etc. so I though it would only talk to your other tools. I could not find in the PODM user guide a place where I could tell PODM to include a given redfish server passing its IP/Port/Credentials. Is it possible? If so, I'd love to try and see if your tool talks to ours. Thank you very much for the suggestion. It was great!

Best regards,

Flávio

tbykowsk commented 6 years ago

Hi @ffdarkpenguin,

Thank you for your valuable feedback!

As @stgrzeszczak have already mentioned, you may find useful diagrams in Chapter 4. of the Intel RSD Architecture Specification, or in Chapter 2. of the GAMI API Specification, but we will try to improve the documentation like you suggested.

PODM can attempt to connect to a service not detected by DHCP or SSPD. In this case, you should create a /tmp/services.list file on a machine where PODM is running. In this file add a line with an address to the REST API and a type of service (psme - for Compute service, rss - for Storage service, rmm - for RMM service, lui - for Deep Discovery service). For example, for a Compute service the line should look like this: https://10.3.0.123:8443/redfish/v1 psme You may also refer to older issues where usage of this functionality was mentioned: https://github.com/01org/IntelRackScaleArchitecture/issues/7, https://github.com/01org/IntelRackScaleArchitecture/issues/8 .

Please be aware, that /tmp/services.list does not support credentials and PODM might not correctly parse data from a service which is not a PSME Rest Server.

tbykowsk commented 6 years ago

I'm closing this issue due to inactivity, but feel free to open a new one if needed.