openbmc / phosphor-net-ipmid

Network IPMI server
Apache License 2.0
9 stars 11 forks source link

Add support for RMCP ASF Presence Ping/Pong messages #15

Closed Kostr closed 3 years ago

Kostr commented 3 years ago

On my AMD EthanolX board I have "Executing the IPMI message failed" messages that spam log two times a minute:

Feb 09 14:37:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:38:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:38:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:39:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:39:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:40:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:40:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:41:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:41:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:42:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:42:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:43:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:43:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:44:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:44:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:45:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:45:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:46:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:46:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:47:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:47:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:48:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:48:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:49:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:49:41 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:50:11 ethanolx netipmid[313]: Executing the IPMI message failed
Feb 09 14:50:41 ethanolx netipmid[313]: Executing the IPMI message failed

These messages come from this code: https://github.com/openbmc/phosphor-net-ipmid/blob/master/sd_event_loop.cpp

void EventLoop::handleRmcpPacket()
{
    try
    {
        auto channelPtr = std::make_shared<udpsocket::Channel>(udpSocket);

        // Initialize the Message Handler with the socket channel
        auto msgHandler = std::make_shared<message::Handler>(channelPtr, io);

        msgHandler->processIncoming();
    }
    catch (const std::exception& e)
    {
        log<level::ERR>("Executing the IPMI message failed",
                        entry("EXCEPTION=%s", e.what()));
    }
}

I've investigated journalctl and the exception is:

"EXCEPTION" : "RMCP Header is invalid",

The code that throws it (https://github.com/openbmc/phosphor-net-ipmid/blob/master/message_parsers.cpp):

std::tuple<std::shared_ptr<Message>, SessionHeader>
    unflatten(std::vector<uint8_t>& inPacket)
{
    // Check if the packet has atleast the size of the RMCP Header
    if (inPacket.size() < sizeof(BasicHeader_t))
    {
        throw std::runtime_error("RMCP Header missing");
    }

    auto rmcpHeaderPtr = reinterpret_cast<BasicHeader_t*>(inPacket.data());

    // Verify if the fields in the RMCP header conforms to the specification
    if ((rmcpHeaderPtr->version != RMCP_VERSION) ||
        (rmcpHeaderPtr->rmcpSeqNum != RMCP_SEQ) ||
        (rmcpHeaderPtr->classOfMsg != RMCP_MESSAGE_CLASS_IPMI))
    {
        throw std::runtime_error("RMCP Header is invalid");
    }
...

These constants equal to:

./message_parsers.hpp:constexpr size_t RMCP_VERSION = 6;
./message_parsers.hpp:constexpr size_t RMCP_SEQ = 0xFF;
./message_parsers.hpp:constexpr size_t RMCP_MESSAGE_CLASS_IPMI = 7;

And from my inspection when the problem arises:

rmcpHeaderPtr->version = 6
rmcpHeaderPtr->rmcpSeqNum = 0xff
rmcpHeaderPtr->classOfMsg = 6

According to the IPMI specification message class 6 means ASF: 2021-02-10_18-02-42

Also, according to the IPMI specification some ASF messages must be supported: 2021-02-10_18-47-45 2021-02-10_18-46-56

So it seems like these exceptions come from the lack of support for RMCP ASF Presence Ping/Pong messages

Ramkosh commented 3 years ago

Hi, we have the implementation for this locally as a patch, I might upstream it in a couple of days.

Kostr commented 3 years ago

@Ramkosh Hi, could you add me as a reviewer, when you do this? Also if you don't have time to upstream it and the patch is open source, you could just point to it, and I can try to upstream it myself.

Ramkosh commented 3 years ago

@Kostr Hi, sure, I will add you as a reviewer! Sorry for the delay, I'll get back to it on Saturday.

Ramkosh commented 3 years ago

@Kostr Hi, here's a link to the patch. Build failed due to the older clang-format on my system, but I need to run right now, so I'll update it tomorrow in the morning.

Kostr commented 3 years ago

@Ramkosh thanks! I've responded in gerrit to your patchset. Please, correct the formatting issues, so the patchset could be proposed to the community to be merged upstream. If you need some help with it, I could send you clang-formatted version of your patchset.

BTW it didn't solve my issue because your patchset covers only direct ASF ping/pong messages and in my case I was getting these error messages from the broadcast ASF ping messages.

Code in the Channel.read() function (socket_channel.hpp) fills pktinfoX structures with the info from the incoming packet and use it later in the outgoing packet. But we can't use it as it is if the IP address in the incoming packet was set to broadcast. So in my case I had to add some code in the end of Channel.read() function.

// Setup source address with Local address in case
// header destination address in the received packet was broadcast
if (pktinfo4) {
    if ((*pktinfo4).ipi_addr.s_addr == 0xffffffff) {
        (*pktinfo4).ipi_addr.s_addr = (*pktinfo4).ipi_spec_dst.s_addr;
        if (pktinfo6) {
            (*pktinfo6).ipi6_addr.s6_addr32[3] = (*pktinfo4).ipi_spec_dst.s_addr;
        }
    }
}

I don't really know if there is a better solution to this problem but it have solved my issue. Also it seems to be a more general problem that is not related directly to ASF Ping/Pong support. So I think your patchset is fine as it is for the first iteration of ASF Ping/Pong support, let's merge it upstream. And then I'll open discussion about broadcast messages and try to merge my patch upstream.