immunant / c2rust

Migrate C code to Rust
https://c2rust.com/
Other
3.81k stars 220 forks source link

Emits code for extern function declaration, rather than the correct prototype #325

Open ryanavella opened 3 years ago

ryanavella commented 3 years ago

I support a large C codebase with a lot of function declarations, all of which are eventually followed by their correct prototypes. C2Rust appears to emit extern fn's for the declarations rather than the prototypes.

Minimal example input:

int putchar();      /* declaration */
int putchar(int c); /* prototype */

int main(void) {
    putchar('\n');
}

Output:

#![allow(dead_code, mutable_transmutes, non_camel_case_types, non_snake_case,
         non_upper_case_globals, unused_assignments, unused_mut)]
#![register_tool(c2rust)]
#![feature(main, register_tool)]
extern "C" {
    /* declaration */
    #[no_mangle]
    fn putchar() -> libc::c_int;
}
/* prototype */
unsafe fn main_0() -> libc::c_int { putchar('\n' as i32); return 0; }
#[main]
pub fn main() { unsafe { ::std::process::exit(main_0() as i32) } }

Expected:

#![allow(dead_code, mutable_transmutes, non_camel_case_types, non_snake_case,
         non_upper_case_globals, unused_assignments, unused_mut)]
#![register_tool(c2rust)]
#![feature(main, register_tool)]
extern "C" {
    /* declaration */
    /* prototype */
    #[no_mangle]
    fn putchar(y: libc::c_int) -> libc::c_int;
}
unsafe fn main_0() -> libc::c_int { putchar('\n' as i32); return 0; }
#[main]
pub fn main() { unsafe { ::std::process::exit(main_0() as i32) } }

My c2rust version is 0.15.1. Reproducible on c2rust.com as well

ryanavella commented 3 years ago

I'm a little out of my element here, but I think I've narrowed down the issue. If I dump the typed clang AST, I can see that it emits a Function for the declaration, but a NonCanonicalDecl for the actual function prototype.

...
CDeclId(
    30,
): Located {
    loc: Some(
        SrcSpan {
            fileid: 1,
            begin_line: 1,
            begin_column: 1,
            end_line: 1,
            end_column: 13,
        },
    ),
    kind: Function {
        is_global: true,
        is_inline: false,
        is_implicit: false,
        is_extern: false,
        is_inline_externally_visible: false,
        typ: CTypeId(
            31,
        ),
        name: "putchar",
        parameters: [],
        body: None,
        attrs: {},
    },
},
CDeclId(
    32,
): Located {
    loc: Some(
        SrcSpan {
            fileid: 1,
            begin_line: 2,
            begin_column: 1,
            end_line: 2,
            end_column: 18,
        },
    ),
    kind: NonCanonicalDecl {
        canonical_decl: CDeclId(
            30,
        ),
    },
},
...

Based on this, I suspect that the prototype is getting categorized incorrectly while building the AST, and then skipped over during the transpiling step

thedataking commented 3 years ago

I think you are on the right track. The problem is likely found in the code that visits the clang AST: https://github.com/immunant/c2rust/blob/master/c2rust-ast-exporter/src/AstExporter.cpp

Try setting some breakpoints on the code that deals with function Decls and run it on your reproducer. We're swamped with work but if you manage to come up with a simple patch, we'll review and merge it in (if you'd like).

ryanavella commented 3 years ago

@thedataking I'll see what I can do, thanks for the pointer!

ryanavella commented 3 years ago

I found a possible clue. This is the output of clang -cc1 -ast-dump test.c (so I've removed c2rust entirely out of the equation)

TranslationUnitDecl 0x39bd8e8 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x39be180 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x39bde80 '__int128'
|-TypedefDecl 0x39be1e8 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x39bdea0 'unsigned __int128'
|-TypedefDecl 0x39be4a8 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x39be2c0 'struct __NSConstantString_tag'
|   `-Record 0x39be238 '__NSConstantString_tag'
|-TypedefDecl 0x39be540 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x39be500 'char *'
|   `-BuiltinType 0x39bd980 'char'
|-TypedefDecl 0x39f7a28 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag [1]'
| `-ConstantArrayType 0x39f79d0 'struct __va_list_tag [1]' 1
|   `-RecordType 0x39f7850 'struct __va_list_tag'
|     `-Record 0x39f77c0 '__va_list_tag'
|-FunctionDecl 0x39f7ad0 <test.c:1:1, col:13> col:5 used putchar 'int ()'
|-FunctionDecl 0x39f7c98 prev 0x39f7ad0 <line:2:1, col:18> col:5 used putchar 'int (int)'
| `-ParmVarDecl 0x39f7bc8 <col:13, col:17> col:17 c 'int'
`-FunctionDecl 0x39f7e08 <line:4:1, line:6:1> line:4:5 main 'int (void)'
  `-CompoundStmt 0x39f7f90 <col:16, line:6:1>
    `-CallExpr 0x39f7f60 <line:5:2, col:14> 'int'
      |-ImplicitCastExpr 0x39f7f48 <col:2> 'int (*)(int)' <FunctionToPointerDecay>
      | `-DeclRefExpr 0x39f7ed8 <col:2> 'int (int)' Function 0x39f7c98 'putchar' 'int (int)'
      `-CharacterLiteral 0x39f7f00 <col:10> 'int' 10

Interestingly, it appears that Clang considers the first putchar to be the canonical declaration. This is very unintuitive to me, but maybe this is expected behavior?

I see two approaches worth considering:

  1. If we think this is unexpected behavior, then we could file a bug upstream.
  2. Otherwise, if this is expected behavior, then we should not omit the ImplicitCastExpr's from the Clang AST. These are needed for valid C to transpile into valid Rust.

For reference, option 2 would look something like this:

#![allow(dead_code, mutable_transmutes, non_camel_case_types, non_snake_case,
         non_upper_case_globals, unused_assignments, unused_mut)]
#![register_tool(c2rust)]
#![feature(main, register_tool)]
extern "C" {
    /* declaration */
    #[no_mangle]
    fn putchar() -> libc::c_int;
}
/* prototype */
unsafe fn main_0() -> libc::c_int { ::std::mem::transmute::<*const (), fn(libc::c_int)->libc::c_int>(putchar as *const ()) ('\n' as i32); return 0; }
#[main]
pub fn main() { unsafe { ::std::process::exit(main_0() as i32) } }