rtlabs-com / p-net

PROFINET device stack for embedded devices
http://www.rt-labs.com
Other
473 stars 193 forks source link

Failing Behavior - Scenario 8 in newest ART release 2.44.1.2 Multiple AR Establishment Test #567

Closed widavies closed 5 months ago

widavies commented 11 months ago

In Behavior - Scenario 8, ART sends a AR Connect request, waits for a AR success response, and then sends an identical AR connect request.

While the stack does raise an alarm, and essentially just ignores the duplicate AR Connect request, it does seem like the latest version of ART wants an explicit failure response like this:

image

This error is properly reconstructed within ART, however, the session is free'd too earlier that the response buffer is cleared before pf_cmrpc_send_once gets a chance to run.

Diving into the code:

1) Within pf_cmrpc_rm_connect_ind, we have some code that checks for a duplicate session:

/* Parse the Connect request - No support for ArSet (yet) */
if (pf_cmrpc_rm_connect_interpret_ind (p_sess, &req_pos, p_ar) == 0)
{
   if (pf_ar_find_by_uuid (net, &p_ar->ar_param.ar_uuid, &p_ar_2) == 0)
   {
      if (p_ar_2->p_sess != p_sess)
      {
         p_sess->kill_session = true;
      }
      pf_ar_release (net, p_ar);
      p_ar = p_ar_2;

      /* The ARUUID is already in use by an established AR. */
      LOG_ERROR (
         PF_RPC_LOG,
         "CMRPC(%d): Duplicate Connect request received\n",
         __LINE__);
      pf_set_error (
         &p_sess->rpc_result,
         PNET_ERROR_CODE_CONNECT,
         PNET_ERROR_DECODE_PNIO,
         PNET_ERROR_CODE_1_CMDEV,
         PNET_ERROR_CODE_2_CMDEV_STATE_CONFLICT);
   }

Later on in this file, if we detect a duplicate session (pr_ar_2 != NULL), we call pf_cmdev_cm_abort(net, p_ar_2)

/* Create Connect response */
LOG_DEBUG (
  PF_RPC_LOG,
  "CMRPC(%d): Create CONNECT response: ret = %d   error: 0x%02X 0x%02X "
  "0x%02X 0x%02X\n",
  __LINE__,
  ret,
  p_sess->rpc_result.pnio_status.error_code,
  p_sess->rpc_result.pnio_status.error_decode,
  p_sess->rpc_result.pnio_status.error_code_1,
  p_sess->rpc_result.pnio_status.error_code_2);
pf_cmrpc_rm_connect_rsp (p_sess, ret, p_ar, res_size, p_res, p_res_pos);

if (p_ar_2 != NULL) // RIGHT HERE
{
  (void)pf_cmdev_cm_abort (net, p_ar_2);
}
else if (
  (ret != 0) ||
  (p_sess->rpc_result.pnio_status.error_code != PNET_ERROR_CODE_NOERROR))
{

The issue with this, is that pf_cmdev_cm_abort eventually calls pf_cmrpc_cmdev_state_ind:

case PNET_EVENT_ABORT:
  /*
   * ABORT may be called from many places within the stack.
   * Simply setting p_sess->kill_session to true only works
   * for RPC requests, but we need to free the session also for
   * internal errors.
   * Do it explicitly here.
   */
  if (p_ar != NULL)
  {
     if (p_ar->p_sess != NULL)
     {
        if (p_ar->p_sess->release_in_progress == false)
        {
           pf_session_release (net, p_ar->p_sess); // PROBLEMATIC!!!!!
        }
     }
     else
     {
        LOG_ERROR (PF_RPC_LOG, "CMRPC(%d): Session is NULL\n", __LINE__);
     }

     /* Free other (CCONTROL) sessions and close all RPC sockets. */
     while (pf_session_locate_by_ar (net, p_ar, &p_sess) == 0)
     {
        pf_session_release (net, p_sess);
     }

     /* Finally free the AR */
     pf_ar_release (net, p_ar);
  }
  break;

pf_session_release frees the p_session_info_t, which subsequently clears out_buffer and related fields. Eventually, when pf_cmrpc_send_once is run, there is nothing to respond with, and the error message is not sent.

I have a temporary workaround, I'm not sure if this may have other issues (or maybe a memory leak), but replacing the if(p_ar_2 != NULL) block in pf_cmrpc_rm_connect_ind with the following fixes the test:

 if (p_ar_2 != NULL)
 {
  // Replacement
   p_sess->release_in_progress = true;
   (void)pf_cmdev_cm_abort (net, p_ar_2);
   p_sess->release_in_progress = false;

   p_sess->p_ar = NULL;
   p_sess->kill_session = true;
 }

Setting p_sess->release_in_progress is a hack, but it prevents pf_session_release from getting called right away, then it sets the p_sess->kill_session flag to true so that pf_session_release is still called after the response is sent.

Let me know if this issue is confirmed or if there is a better solution. Thanks.

pyhys commented 7 months ago

Hi! This is a known bug in the latest ART tester release. Please re-run the tests with the previous ART tester version.

widavies commented 6 months ago

@pyhys I just went through Profinet Conformance testing (and passed!) at the Profinet Conformance Center in Tennessee - I reported a handful of issues with ART, which got relayed to the development team on it, so hopefully we'll have a fixed version soon!

Thanks for taking a look

pyhys commented 5 months ago

Great! Please reopen if there are more issues.