ultimate-pa / ultimate

The Ultimate program analysis framework.
https://ultimate.informatik.uni-freiburg.de
190 stars 40 forks source link

Bug: Ultimate C parser crashes when there are Doxygen group comments in the C code #667

Open bahnwaerter opened 1 month ago

bahnwaerter commented 1 month ago

The current C parser crashes when there are Doxygen group comments in the C code. This issue can be reproduced with the latest version of Ultimate Automizer (0.2.4-dev-1ececfe with OpenJDK 11.0.23) using the following input:

//#SAFE

typedef long unsigned int __uint32_t;
typedef __uint32_t uint32_t;

/** \ingroup  CMSIS_Core_FunctionInterface
    \defgroup CMSIS_Core_RegAccFunctions CMSIS Core Register Access Functions
  @{
 */

/**
  \brief   Get Control Register
  \details Returns the content of the Control Register.
  \return               Control Register value
 */
static inline uint32_t __get_CONTROL(void)
{
  uint32_t result;

  __asm volatile ("MRS %0, control" : "=r" (result) );
  return(result);
}

/**
  \brief   Set Control Register
  \details Writes the given value to the Control Register.
  \param [in]    control  Control Register value to set
 */
static inline void __set_CONTROL(uint32_t control)
{
  __asm volatile ("MSR control, %0" : : "r" (control) : "memory");
}

/*@} end of CMSIS_Core_RegAccFunctions */

int main(void)
{
  uint32_t control = __get_CONTROL();
  assert(control >= 0x00000000);
  __set_CONTROL(control);
  return 0;
}

Considering this example input, the C parser crashes when trying to parse the end marker of a Doxygen group comment (/*@} end of CMSIS_Core_RegAccFunctions */ in line 34). The following error message is reported during a crash:

[...]
[2024-05-23 17:51:32,051 WARN  L194 CommentParser]: Syntax Error: #120[rbrace](0/1 - 0/2)
[...]
[2024-05-23 17:51:32,421 INFO  L174    ResultUtil]:  * Results from CDTParser:
[2024-05-23 17:51:32,421 INFO  L198    ResultUtil]:   - SyntaxErrorResult [Line: 34]: Incorrect Syntax
[2024-05-23 17:51:32,422 INFO  L202    ResultUtil]:     Syntax Error: #120[rbrace](0/1 - 0/2)

In this example, the C parser for the comments expects ACSL specifications and instead gets Doxygen group comments that violate the grammar of ACSL. Unfortunately, the parser cannot ignore the Doxygen group comments.

schuessf commented 1 month ago

We expect any comment starting with @ to be ACSL und thus try to parse the comment as such, see here. It would be possible to add exceptions there, for this example to work, we could just ignore { and }.

bahnwaerter commented 1 month ago

We expect any comment starting with @ to be ACSL und thus try to parse the comment as such, see here. It would be possible to add exceptions there, for this example to work, we could just ignore { and }.

That's a good idea, but we first have to check that the braces cannot appear in ACSL.

bahnwaerter commented 1 month ago

In fact, the braces can also occur in ACSL comments. Here is an example from the ACSL standard (see 2.2.7 Structures, Unions and Arrays in logic), where parentheses are used in logic expressions for initialization and array access.

//@ type point = struct { real x; real y; };
//@ type triangle = point[3];
//@ logic point origin = { .x = 0.0 , .y = 0.0 };
/*@ logic triangle t_iso = { [0] = origin,
  @                          [1] = { .y = 2.0 , .x = 0.0 }
  @                          [2] = { .x = 2.0 , .y = 0.0 }};
  @*/

Fortunately, the braces are never at the beginning of a comment. We can therefore ignore the braces from Doxygen group comments at the beginning of each comment without restricting the ACSL syntax.