rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.17k stars 12.8k forks source link

DWARF information encodes discriminant values for tagged enums incorrectly #125147

Open sdnr1 opened 6 months ago

sdnr1 commented 6 months ago

The Rust compiler emits tagged enums and their members as DW_TAG_variant_part and DW_TAG_variant DIEs in DWARF correspondingly. According to the DWARF standard, the discriminant value for each of the variant members should be LEB128 encoded. In practice, the Rust compiler does not encode these values in LEB128 encoding, thereby it is not compliant with the DWARF standard.

Furthermore, there seems to be some undefined behavior with negative discriminant values. The following examples show 2 cases of tagged enums with negative discriminants.


Example 1

Discriminant type is 32 bit signed integer.

#[repr(i32)]
pub enum Foo {
    Unit = -2,
    TwoInts(u32, u32) = -1,
    ThreeShorts{x: u16, y: u16, z: u16} = 0,
}

DWARF:

0x00000078:     DW_TAG_structure_type
                  DW_AT_name [DW_FORM_strp]     ("Foo")
                  DW_AT_byte_size [DW_FORM_data1]       (0x0c)
                  DW_AT_alignment [DW_FORM_udata]       (4)

0x0000007f:       DW_TAG_variant_part
                    DW_AT_discr [DW_FORM_ref4]  (0x00000084)

0x00000084:         DW_TAG_member
                      DW_AT_type [DW_FORM_ref4] (0x00000114 "i32")
                      DW_AT_alignment [DW_FORM_udata]   (4)
                      DW_AT_data_member_location [DW_FORM_data1]        (0x00)
                      DW_AT_artificial [DW_FORM_flag_present]   (true)

0x0000008b:         DW_TAG_variant
                      DW_AT_discr_value [DW_FORM_data8] (0x00000000fffffffe)

0x00000094:           DW_TAG_member
                        DW_AT_name [DW_FORM_strp]       ("Unit")
                        DW_AT_type [DW_FORM_ref4]       (0x000000bd "t3::Foo::Unit")
                        DW_AT_alignment [DW_FORM_udata] (4)
                        DW_AT_data_member_location [DW_FORM_data1]      (0x00)

0x0000009f:           NULL

0x000000a0:         DW_TAG_variant
                      DW_AT_discr_value [DW_FORM_data8] (0x00000000ffffffff)

0x000000a2:           DW_TAG_member
                        DW_AT_name [DW_FORM_strp]       ("TwoInts")
                        DW_AT_type [DW_FORM_ref4]       (0x000000c4 "t3::Foo::TwoInts")
                        DW_AT_alignment [DW_FORM_udata] (4)
                        DW_AT_data_member_location [DW_FORM_data1]      (0x00)

0x000000ad:           NULL

0x000000ae:         DW_TAG_variant
                      DW_AT_discr_value [DW_FORM_data1] (0x00)

0x000000b0:           DW_TAG_member
                        DW_AT_name [DW_FORM_strp]       ("ThreeShorts")
                        DW_AT_type [DW_FORM_ref4]       (0x000000e2 "t3::Foo::ThreeShorts")
                        DW_AT_alignment [DW_FORM_udata] (4)
                        DW_AT_data_member_location [DW_FORM_data1]      (0x00)

0x000000bb:           NULL

0x000000bc:         NULL

Example 2

Discriminant type is 64 bit signed integer.

#[repr(i64)]
pub enum Foo {
    Unit = -2,
    TwoInts(u32, u32) = -1,
    ThreeShorts{x: u16, y: u16, z: u16} = 0,
}

DWARF:

0x00000079:     DW_TAG_structure_type
                  DW_AT_name [DW_FORM_strp]     ("Foo")
                  DW_AT_byte_size [DW_FORM_data1]       (0x10)
                  DW_AT_alignment [DW_FORM_udata]       (8)

0x00000080:       DW_TAG_variant_part
                    DW_AT_discr [DW_FORM_ref4]  (0x00000085)

0x00000085:         DW_TAG_member
                      DW_AT_type [DW_FORM_ref4] (0x0000010e "i64")
                      DW_AT_alignment [DW_FORM_udata]   (8)
                      DW_AT_data_member_location [DW_FORM_data1]        (0x00)
                      DW_AT_artificial [DW_FORM_flag_present]   (true)

0x0000008c:         DW_TAG_variant
                      DW_AT_discr_value [DW_FORM_data1] (0xfe)

0x0000008e:           DW_TAG_member
                        DW_AT_name [DW_FORM_strp]       ("Unit")
                        DW_AT_type [DW_FORM_ref4]       (0x000000b7 "t3::Foo::Unit")
                        DW_AT_alignment [DW_FORM_udata] (8)
                        DW_AT_data_member_location [DW_FORM_data1]      (0x00)

0x00000099:           NULL

0x0000009a:         DW_TAG_variant
                      DW_AT_discr_value [DW_FORM_data1] (0xff)

0x0000009c:           DW_TAG_member
                        DW_AT_name [DW_FORM_strp]       ("TwoInts")
                        DW_AT_type [DW_FORM_ref4]       (0x000000be "t3::Foo::TwoInts")
                        DW_AT_alignment [DW_FORM_udata] (8)
                        DW_AT_data_member_location [DW_FORM_data1]      (0x00)

0x000000a7:           NULL

0x000000a8:         DW_TAG_variant
                      DW_AT_discr_value [DW_FORM_data1] (0x00)

0x000000aa:           DW_TAG_member
                        DW_AT_name [DW_FORM_strp]       ("ThreeShorts")
                        DW_AT_type [DW_FORM_ref4]       (0x000000dc "t3::Foo::ThreeShorts")
                        DW_AT_alignment [DW_FORM_udata] (8)
                        DW_AT_data_member_location [DW_FORM_data1]      (0x00)

0x000000b5:           NULL

0x000000b6:         NULL

Note that the form for DW_AT_discr_value attribute should not be DW_FORM_dataX, rather it should be DW_FORM_sdata.

Meta

rustc --version --verbose:

rustc 1.74.1 (a28077b28 2023-12-04)
binary: rustc
commit-hash: a28077b28a02b92985b3a3faecf92813155f1ea1
commit-date: 2023-12-04
host: x86_64-unknown-linux-gnu
release: 1.74.1
LLVM version: 17.0.4
khuey commented 4 months ago

That may be what the spec says, but it's not what rustc and gdb actually do. DWARF has specified DW_AT_discr_value since DWARF 2 but gdb didn't actually use it until @tromey implemented support for Rust enums a few years ago. Was there a reason the spec wasn't followed? It looks like when DW_AT_discr_list was added a bit later it was written to use LEB128.

tromey commented 3 months ago

This part of the DWARF standard seems strange to me. It doesn't make sense to specify that the value for DW_AT_discr_value is leb128-encoded when there is also a DWARF form associated with the value. I suppose DWARF could require only certain forms to be used here, but I don't see why that would be good.

I tend to think a DWARF bug report is in order here. I don't recall why I didn't do this back in 2018.

Requiring leb128 for DW_AT_discr_list does make sense OTOH, since those have to be emitted using some block form.