sighingnow / libclang

(Unofficial) Release libclang (clang.cindex) on pypi.
https://pypi.org/project/libclang
Other
84 stars 21 forks source link

Incorrect spelling for CALL_EXPR arguments when call comes from macro expansion #61

Closed dmcardle closed 1 year ago

dmcardle commented 1 year ago

Hi, folks!

As the title suggests, I'm getting an incorrect spelling for CALL_EXPR arguments, but only when the call site was created by expanding a macro.

Here's a Python test that shows what I'm talking about:

import clang.cindex
import tempfile
import unittest

class TestLibclang(unittest.TestCase):
    def test_repro_libclang_quirk(self):
        """Document the libclang quirk related to macro expansion. Issue
        lowRISC/opentitan#19438. When we use libclang to find a call site that
        was created by a macro expansion, the call site's arguments' tokens are
        incorrect. They seem to point into the non-preprocessed translation
        unit.
        """

        C_SRC_WITH_MACRO_EXPANSION = b"""
#define CONST_FOO 0
#define CALL_MAGIC(name) magic(CONST_##name)
void magic(int register_offset);
void entry_point(void) {
  CALL_MAGIC(FOO);
}
"""
        INCORRECT_ARG_TOKENS = [
            '0', '#', 'define', 'CALL_MAGIC', '(', 'name', ')', 'magic', '(',
            'CONST_', '##', 'name', ')', 'void', 'magic', '(', 'int',
            'register_offset', ')', ';', 'void', 'entry_point', '(', 'void',
            ')', '{', 'CALL_MAGIC', '(', 'FOO', ')'
        ]

        index = clang.cindex.Index.create()

        with tempfile.NamedTemporaryFile(suffix=".c") as tf:
            tf.write(C_SRC_WITH_MACRO_EXPANSION)
            tf.flush()

            translation_unit = index.parse(tf.name, args=[])

            [cursor] = [
                c for c in translation_unit.cursor.walk_preorder()
                if c.kind == clang.cindex.CursorKind.CALL_EXPR and
                c.displayname == 'magic'
            ]

            [arg] = list(cursor.get_arguments())

            tokens = [t.spelling for t in arg.get_tokens()]

            # These assertions may feel a little backwards because the purpose
            # of this test is to document the unwanted behavior.
            self.assertEqual(tokens, INCORRECT_ARG_TOKENS)
            self.assertNotEqual(tokens, ['CONST_FOO'])

if __name__ == "__main__":
    unittest.main()
sighingnow commented 1 year ago

get_tokens only return tokens without preprocess.

To traverse the AST, use cursor.get_children() ( https://github.com/sighingnow/libclang/blob/master/python/clang/cindex.py#L2002) instead.

dmcardle commented 1 year ago

Thanks for the response, @sighingnow!

I don't think cursor.get_children() solves the problem (or maybe I'm misunderstanding).

I'm already traversing the AST and finding the desired CALL_EXPR, but I want to get the tokens after preprocessing. Something like clang::syntax::TokenBuffer::expandedTokens().

dmcardle commented 1 year ago

I added a little bit to the test above — I can't figure out how to wrangle get_children() to do something useful here.

            # Use `cursor.get_children()` as recommended in
            # <https://github.com/sighingnow/libclang/issues/61>.

            # The CALL_EXPR has two children, but their spelling does not
            # correspond to the call arguments.
            self.assertEqual([c.spelling for c in cursor.get_children()],
                             ['magic', ''])

            # The CALL_EXPR's one argument has no children.
            self.assertEqual(len(list(arg.get_children())), 0)