rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.96k stars 12.53k forks source link

debuginfo: How to (ideally) represent reference and pointer types in DWARF #37504

Open michaelwoerister opened 7 years ago

michaelwoerister commented 7 years ago

Currently, we represent thin references and pointers with DW_TAG_pointer_type DIEs and fat pointers (slices and trait objects) as DW_TAG_struct DIEs with fields representing payload and metadata pointers. This is not ideal and with debuggers knowing about Rust, we can do better. The question is, what exactly do we want the representation for these kinds of types to look like.

Some things seem pretty straightforward to me:

But beyond that, there are some decisions to be made:

(1) How do we represent mutability?

The C++ version of DWARF represents a const pointer like const char * with three separate type entries:

0:
DW_TAG_base_type
    DW_AT_name "char"
    ... 

1:
DW_TAG_const_type
    DW_AT_type: ref to <0>

2:
DW_TAG_pointer_type
    DW_AT_type: ref to <1>

I think this is a bit verbose and I'm not sure it is entirely appropriate for Rust. Do we really have const and mut types? That is, does Rust have the concept of a mut i32 at the type level, for example? I mean there are mutable and immutable slots/memory locations and we have "mutable" and "shared" references, but those two things seem kind of different to me.

As an alternative to using DW_TAG_const_type for representing mutability, we could re-use the DW_AT_mutable attribute that is already defined in DWARF. In C++ DWARF it is used for mutable fields. We could use it for reference type and local variable DIEs:

0: // char
DW_TAG_base_type
    DW_AT_name "char"
    ... 

1: // &mut char
DW_TAG_reference_type
    DW_AT_type: ref to <0>
    DW_AT_mutable: true

2: // &char
DW_TAG_reference_type
    DW_AT_type: ref to <0>
    DW_AT_mutable: false       // or just leave it off

3: 
DW_TAG_variable
    DW_AT_name: "foo"
    DW_AT_type: ref to <0>
    DW_AT_mutable: true
    ...

(2) How to represent fat pointers?

The pointer types in C/C++ DWARF don't have DW_TAG_member sub-DIEs, since they are always just values. Fat pointers in Rust are different: they have one field that is a pointer to the data, and another field that holds additional information, either the size of a slice or the pointer to a vtable. These need to be described somehow. I see a few options:

  1. A fat-pointer type is described by a DW_TAG_pointer_type or DW_TAG_reference_type DIE with two fields that are described by DW_TAG_member sub-DIEs, both having the DW_AT_artificial attribute. @tromey once suggested for slices that the field entries have no name and the debugger determines which is which by the type (the size is always an integer type, the data is always a pointer type). This could also be extended for trait objects, since the data pointer will always be a pointer to a trait and the vtable-pointer will always be something else.
  2. Treat trait objects and slices differently. Have a new DW_TAG_slice_type DIE that follows the encoding above and borrow some other attributes for trait objects: a DW_AT_vtable_elem_location attribute holds the offset of the vtable field within the fat-pointer value, and a DW_AT_object_pointer attribute does the same for the data pointer. This is distinctly not how these attributes are used in a C++ context but it would be a nice fit, I think.
  3. Mix of the above with DW_AT_object_pointer indicating data pointer field

Another questions is: Should fat-pointers (and thin pointers too, maybe) have a DW_AT_byte_size attribute that specifies their size explicitly?

cc @tromey, @Manishearth See also https://github.com/rust-lang/rust/issues/33073

Manishearth commented 7 years ago

That is, does Rust have the concept of a mut i32 at the type level, for example?

No. mut is part of reference types, no more. I like your DW_AT_mutable solution, except we would only include that attribute for DW_TAG_reference_types (and possibly only when it is on). Otherwise we have the possiblility of mut i32 being a type.

How to represent fat pointers?

I like both solution (1) and (2), no clear preference.

Beware that traits and slices aren't the only things that can be DSTd, you also have struct MyDST(u8, u8, u8, [u8]). This is similar to a slice DST but with different offsets. In such cases you also want to generate field names and other debuginfo for the DST itself (unlike slices and traits where we interpret them specially). I think we can handle this somehow by doing whatever C does for trailing T[] fields and generating debuginfo structs for those.

tromey commented 7 years ago

As an alternative to using DW_TAG_const_type for representing mutability, we could re-use the DW_AT_mutable attribute that is already defined in DWARF.

This seems reasonable to me.

Manishearth commented 7 years ago

I think we can handle this somehow by doing whatever C does for trailing T[] fields and generating debuginfo structs for those.

C seems to use a TAG_array_type with a TAG_subrange_type that doesn't have an AT_count. Should work for us.

tromey commented 7 years ago

@tromey once suggested for slices that the field entries have no name and the debugger determines which is which by the type (the size is always an integer type, the data is always a pointer type).

For fat pointers I was thinking DW_AT_vtable_elem_location would be required. I'd rather the decoding be based on DWARF attributes. (If this concept can be extended to slices, so much the better; I was being inconsistent here.)

On the whole I'd rather the output be much more explicit. That is, instead of determining whether something is a slice by checking its name, introduce DW_TAG_slice_type. Similarly, have a separate tag to represent a trait object -- don't try to reuse whatever is being done for slices. I think it's fine if existing DWARF tags are repurposed in a rust-specific way; but I find it a bit ugly if the tags have their meaning overloaded and require the debugger to encode "excess" knowledge of rust to do its job.

Another questions is: Should fat-pointers (and thin pointers too, maybe) have a DW_AT_byte_size attribute that specifies their size explicitly?

Thin pointers, probably not. Fat pointers, sure. Ideally it should be possible for gdb to create a trait object; though of course we're still a ways away from that.

tromey commented 7 years ago

Beware that traits and slices aren't the only things that can be DSTd

What does "DSTd" mean?

michaelwoerister commented 7 years ago

What does "DSTd" mean?

I assume "turned into a dynamically sized type".

tromey commented 7 years ago

I assume "turned into a dynamically sized type".

Aha, thanks. For dynamically sized types, we should just use the standard DWARF stuff. If the length is known then DW_AT_count can be a location expression (there are other similar ways as well).

nagisa commented 7 years ago

What are semantics of DW_TAG_reference_type and how do they differ from the pointer_type?

tromey commented 7 years ago

What are semantics of DW_TAG_reference_type and how do they differ from the pointer_type?

DWARF doesn't go into great detail here, but basically the answer is "just like C++". However, in DWARF it is also normal to reuse tags for different things depending on the CU's language; and for Rust I think the obvious answer is that a DW_TAG_pointer_type should be used for raw pointer types and DW_TAG_reference_type for ordinary (safe) references. So, a pointer might be null or invalid; but a reference will not be (with the usual caveat that debuggers can sometimes see uninitialized objects, memory can be trashed, etc).

michaelwoerister commented 7 years ago

Beware that traits and slices aren't the only things that can be DSTd, you also have struct MyDST(u8, u8, u8, [u8]).

That's a good point. In a way a regular slice is just a special case of a struct with a trailing [T]. Also (as opposed to C?) in Rust we always know the length of the trailing [T] because of the fat-pointer field.

So, we have this:

kind fat
&T (regular reference) no
&[T] (regular slice) yes
&Struct /w trailing [T] yes
&Trait (regular trait object) yes
&Struct /w trailing trait yes

I think it would be nice if &[T] and &Struct /w trailing [T] would be represented the same way, and correspondingly also &Trait and &Struct /w trailing trait, since in both cases the former is just a special case of the latter (kind of).

Also note there are is also a pointer variant for each of the above. If we have different tags for each of those, we would get quite a few tags:

DW_TAG_reference_type        // &T (regular reference)
DW_TAG_slice_type            // &[T] (regular slice)
DW_TAG_dst_reference_type    // &Struct /w trailing [T]
DW_TAG_trait_object          // &Trait (regular trait object)
DW_TAG_trait_object2         // &Struct /w trailing trait

DW_TAG_pointer_type          // *T (regular pointer)
DW_TAG_ptr_slice_type        // *[T] (regular slice)
DW_TAG_dst_pointer_type      // *Struct /w trailing [T]
DW_TAG_trait_object_ptr      // *Trait (regular trait object)
DW_TAG_trait_object_ptr2     // *Struct /w trailing trait

This seems a bit excessive to me.

We could also just have a DW_AT_RUST_PTR_KIND attribute with the possible values of thin, slice, trait and attach those to either a DW_TAG_reference_type or a DW_TAG_pointer_type. The rest of the DIE's fields depends on the kind. That would still be rather explicit, we'd just need one additional attribute.

michaelwoerister commented 7 years ago

To give an example of what the DW_AT_rust_ptr_kind variant could look like:

// &T (regular reference)
DW_TAG_reference_type
    DW_AT_rust_pointer_kind  thin
    DW_AT_mutable            true or false // defaults to false if not present
    DW_AT_type               <ref to type>

// &[T] (regular slice)
DW_TAG_reference_type
    DW_AT_rust_pointer_kind  slice
    DW_AT_mutable            true or false // defaults to false if not present
    DW_AT_type               <ref to type>
    DW_AT_object_pointer     <expr that yields address of first element>
    DW_AT_count              <expr that computes count>

// &Struct /w trailing [T]
DW_TAG_reference_type
    DW_AT_rust_pointer_kind  slice
    DW_AT_mutable            true or false // defaults to false if not present
    DW_AT_type               <ref to type>
    DW_AT_object_pointer     <expr that yields address of struct>
    DW_AT_count              <expr that computes count>

// &Trait (regular trait object)
DW_TAG_reference_type
    DW_AT_rust_pointer_kind     trait
    DW_AT_mutable               true or false // defaults to false if not present
    DW_AT_type                  <ref to type>
    DW_AT_object_pointer        <expr that yields address of object>
    DW_AT_vtable_elem_location  <expr that computes address of vtable>

// &Struct /w trailing trait
DW_TAG_reference_type
    DW_AT_rust_pointer_kind     trait
    DW_AT_mutable               true or false // defaults to false if not present
    DW_AT_type                  <ref to type>
    DW_AT_object_pointer        <expr that yields address of object>
    DW_AT_vtable_elem_location  <expr that computes address of vtable>

In this particular form, we would still have to look at the target type to find out if we have a regular slice or a struct/enum with a trailing [T] (the same goes for trait objects).

cuviper commented 7 years ago

Keep in mind that every new syntax you invent means new things you have to teach tools. GDB and LLDB aren't the only debuginfo consumers out there. So I would strongly lean towards semantics that a C++ tool would understand, even if seems less ideal to your own taste. At least think about how an uninformed tool might interpret your proposed scheme, compared to the status quo.

michaelwoerister commented 7 years ago

@cuviper Do you have examples of tools that use type information (as opposed to just line-tables)?

cuviper commented 7 years ago

Sure, SystemTap and Dyninst are two that I'm very familiar with.

michaelwoerister commented 7 years ago

One thing that might be problematic about using DWARF expressions for getting element count/vtable address are optimizations that can pick apart fat pointers, like SROA. For those it might be better to have plain member DIEs? Though I'm not exactly sure how a debugger would handle this: If the value of a variable is calculated via a number of DW_OP_bit_pieces, we will the debugger reconstruct the value before evaluating an expression that takes the value as input?

cuviper commented 7 years ago

And my involvement on those means I could also work on adding new Rust semantics to them. I honestly haven't looked closely yet how well they interpret Rust's current DWARF output. I just hope more generally that tools could Just Work as much as possible. :)

cuviper commented 7 years ago

If the value of a variable is calculated via a number of DW_OP_bit_pieces, we will the debugger reconstruct the value before evaluating an expression that takes the value as input?

Wouldn't it have to reconstruct it? I don't see what else would make any sense. (And if optimizations make some of this inaccessible, so be it. An -Og might be more conservative.)

michaelwoerister commented 7 years ago

@cuviper Yes, that's a good point. We'll want to strike a balance between not doing everything differently from everybody else and doing things in a way that are a good fit for Rust.

Regarding DW_OP_bit_piece, you're probably right. So using expressions wouldn't be much of a problem for optimizations. Using member DIEs would still be a good idea, I guess, because that's a very stable way of encoding things, very easy to make sense of for every tool.

tromey commented 7 years ago

So I would strongly lean towards semantics that a C++ tool would understand, even if seems less ideal to your own taste

The counterpoint here is stuff like the existing representation of Rust enum types, which requires significant decoding in the debugger. In fact some new cases were just implemented this week. This is one reason I think it's better to just add new tags, along with helper attributes to describe things more precisely.

I do agree that reusing existing tags makes sense when possible.

Maybe this part of the discussion would be improved if it were more specific. For instance, how would you propose handing the cases under discussion here?

If the value of a variable is calculated via a number of DW_OP_bit_pieces, we will the debugger reconstruct the value before evaluating an expression that takes the value as input?

Yes, gdb does this already. I implemented it (:-) when gcc added debuginfo for SRA.

Manishearth commented 7 years ago

The counterpoint here is stuff like the existing representation of Rust enum types, which requires significant decoding in the debugger.

Yeah, both ADTs and DST are Rust-specific types that don't have a C++ analogue, and pretending to a C++ type for the sake of tooling will probably mean that the tools won't display the right thing anyway.

michaelwoerister commented 7 years ago

Yeah, both ADTs and DST are Rust-specific types that don't have a C++ analogue, and pretending to a C++ type for the sake of tooling will probably mean that the tools won't display the right thing anyway.

I think we can achieve a lot of backwards compatibility (or easy portability) if we use the standard tags and attributes (like DW_TAG_member, DW_TAG_byte_size, etc) like everyone else does.

Maybe this part of the discussion would be improved if it were more specific. For instance, how would you propose handing the cases under discussion here?

Can you elaborate on what you mean exactly?

cuviper commented 7 years ago

I think the request to be more specific was aimed at me. :) And... I'll have to think on it. But it sounds like we're all agreeing not to stray too far. It looks like the proposal for thin &T would already work just fine for a tool that knows C++ references, at least, so that's good. If all those fat pointers are currently opaque to tools, then finding a new meaningful representation is fine.

Should fat-pointers (and thin pointers too, maybe) have a DW_AT_byte_size attribute that specifies their size explicitly?

On this point in particular, I don't think thin pointers need it, as @tromey said. I think it would be very helpful for fat pointers though, if nothing else just to raise a flag to the tools that it's abnormal.

cuviper commented 7 years ago

@tromey I found a message on gdb-patches which describes an ADA "unconstrained array" fat pointer. It's not the same layout as a Rust slice, but I think the same concepts could apply. What do you think of that representation? https://sourceware.org/ml/gdb-patches/2014-08/msg00310.html

So a similar Rust &[T] would be something like:

DW_TAG_array_type
    DW_AT_mutable          true or false // defaults to false if not present
    DW_AT_type             <ref to type>
    DW_AT_data_location    <expr that yields address of first element>
  DW_TAG_subrange_type
      DW_AT_type           <ref to type>
      DW_AT_count          <expr that computes count>

I suspect this will look more familiar to tools that already know VLAs.

In any case, I think data_location is probably a better fit where object_pointer was proposed earlier.

tromey commented 7 years ago

ADA "unconstrained array"

For this particular representation, I think the issue is that there's no obvious way to dynamically construct an instance. However, that's a reasonable thing to want to do. In fact right now gdb does it, though by baking in some knowledge of the Rust ABI -- but avoiding this is one of my goals. (Another important goal being winding up with something we can document and attempt to get into DWARF 6.)

I've been giving this topic some thought tonight and I have a number of issues to raise, which in my mind generally point to the usefulness of adding new tags where needed; though naturally I value your insights.

This is a bit unsorted it turns out. Maybe this isn't an ideal forum for this sort of discussion.

I think the current approach could be described as "keep it close-ish to C++ and hope the tools are ok". I found this pretty inadequate for gdb, and I suspect for lldb in the end the only answer will be a more full port. There are just too many differences and they are accumulating.

michaelwoerister commented 7 years ago

I think it's also important to note that we won't be able to come up with an encoding that is just understood by existing tools. The current approach has the goal of not crashing existing tools while providing enough information for pretty printers to have some minimal functionality. I think we have reached the limits of this approach and we'll need to make breaking changes going forward anyway.

I think we should just choose clean encodings that don't do anything fancy. That should help existing tools to add support with minimal effort.

cuviper commented 7 years ago

I've been thinking more about this, and have come around to see it's not so horrible for Rust to invent new syntax (tags/attrs) for things that are truly unique. However, I think we should avoid overloading existing constructs in surprising ways. Namely, DW_TAG_reference_type is a good fit for &T thin references, and most existing tools should already do the right thing there. But I think the fat references should use a distinct tag, or even separate distinct tags for each, e.g. DW_TAG_RUST_slice and DW_TAG_RUST_trait_object.

(I don't know if the standard says anything about this, but I like having CAPS prefixes on non-standard extensions.)

RReverser commented 6 years ago

@cuviper It looks like DWARF information emitted by Rust still doesn't give any hints to distinguish between &, &mut, *const and *mut:

0x000000c1:     TAG_pointer_type [4]
                 AT_type( {0x000000ca} ( u32 ) )
                 AT_name( "*const u32" )

0x000000ca:     TAG_base_type [5]
                 AT_name( "u32" )
                 AT_encoding( DW_ATE_unsigned )
                 AT_byte_size( 0x04 )

0x000000d1:     TAG_pointer_type [4]
                 AT_type( {0x000000ca} ( u32 ) )
                 AT_name( "*mut u32" )

0x000000da:     TAG_pointer_type [4]
                 AT_type( {0x000000ca} ( u32 ) )
                 AT_name( "&u32" )

0x000000e3:     TAG_pointer_type [4]
                 AT_type( {0x000000ca} ( u32 ) )
                 AT_name( "&mut u32" )

Is this the only relevant issue / discussion or has there been some progress tracked elsewhere perhaps?

RReverser commented 6 years ago

I guess I can just use prefix of AT_name to distinguish between them for now, but it seems quite hacky.

tromey commented 6 years ago

I guess I can just prefix of AT_name to distinguish between them for now, but it seems quite hacky.

That's what gdb does and what I plan to do in lldb, at least in the short run. Longer term I think we should use DWARF tags to differentiate, as discussed here.

eddyb commented 4 years ago

@tromey wrote (https://github.com/rust-lang/rust/issues/37504#issuecomment-257434920):

Aha, thanks. For dynamically sized types, we should just use the standard DWARF stuff. If the length is known then DW_AT_count can be a location expression (there are other similar ways as well).

@cuviper wrote (https://github.com/rust-lang/rust/issues/37504#issuecomment-257716268):

I found a message on gdb-patches which describes an ADA "unconstrained array" fat pointer. It's not the same layout as a Rust slice, but I think the same concepts could apply.


Three years later, I was looking through the DWARF5 spec in case there's anything potentially useful, and came across this Fortran example (page 320, "Figure D.13"):

10$: DW_TAG_array_type
        DW_AT_type(reference to real)
        DW_AT_rank(expression=
            DW_OP_push_object_address
            DW_OP_lit<n> ! offset of rank in descriptor
            DW_OP_plus
            DW_OP_deref)
        DW_AT_data_location(expression=
            DW_OP_push_object_address
            DW_OP_lit<n> ! offset of data in descriptor
            DW_OP_plus
            DW_OP_deref)
11$:    DW_TAG_generic_subrange
            DW_AT_type(reference to integer)
            DW_AT_lower_bound(expression=
            ! Looks up the lower bound of dimension i.
            ! Operation ! Stack effect
            ! (implicit) ! i
                DW_OP_lit<n> ! i sizeof(dim)
                DW_OP_mul ! dim[i]
                DW_OP_lit<n> ! dim[i] offsetof(dim)
                DW_OP_plus ! dim[i]+offset
                DW_OP_push_object_address ! dim[i]+offsetof(dim) objptr
                DW_OP_plus ! objptr.dim[i]
                DW_OP_lit<n> ! objptr.dim[i] offsetof(lb)
                DW_OP_plus ! objptr.dim[i].lowerbound
                DW_OP_deref) ! *objptr.dim[i].lowerbound
            DW_AT_upper_bound(expression=
            ! Looks up the upper bound of dimension i.
                DW_OP_lit<n> ! sizeof(dim)
                DW_OP_mul
                DW_OP_lit<n> ! offsetof(dim)
                DW_OP_plus
                DW_OP_push_object_address
                DW_OP_plus
                DW_OP_lit<n> ! offset of upperbound in dim
                DW_OP_plus
                DW_OP_deref)
            DW_AT_byte_stride(expression=
                ! Looks up the byte stride of dimension i.
                ...
                ! (analogous to DW_AT_upper_bound)
            )

There is also an earlier example in Appendix D that might be simpler in terms of Fortran features it describes, but is longer so I'm not going to paste it here.

Overall, it looks like DWARF is designed to support fully dynamic multidimensional arrays and slices, which is more powerful than Rust needs.

Given the DWARF5 spec, its examples, and the comments from years ago in this thread, I believe we may have a path forward if we choose to go down that route.


The main problem I see, for handling slices like this (assuming LLVM and debuggers support the necessary features), is that DW_OP_push_object_address has to push the address of the wide pointer (&[T] or *[T]), not the value of the data pointer, meaning it doesn't compose with DWARF pointer/reference types.

And for &(A, B, [T]), there is no &[T] in memory, and I can't think of any nice way of propagating the slice length all the way down to it.

RReverser commented 4 years ago

This is a very old and long thread and it's been a while since I looked at details, but I'd like to point out that

That is, does Rust have the concept of a mut i32 at the type level, for example?

No. mut is part of reference types, no more.

it not entirely true, or at least, not any different from the situation in C / C++.

Aside from references, Rust also has mutable and immutable variables, parameters and so on, just like C / C++ does. So when one says that Rust doesn't have mut i32 at the type level, same can be said about const uint32_t at the type level in C / C++, because in both cases they describe the actual slot, pointer or a reference and not the value itself.

And yet, even though C / C++ has comparable type semantics, it already has an established DWARF representation for these different types - by using the earlier mentioned "constifying newtypes".

One thing that was brought up and still remains true is that for Rust such representation is potentially more wasteful, because immutable types in Rust are much more popular than in C / C++ due to the flipped defaults.

This might still be true, but on the other hand DWARF representation is fairly compact, and it would be worth measuring first whether introducing a new attribute really saves any noticeable amount of space compared to a separate type ref (which is essentially just a type tag + a reference to the inner type).

For now, it would be great to unblock this issue and implement at least the suboptimal-but-already-supported-in-most-tools representation for immutable vs mutable references, and then we can iterate on it in future PRs.

pnkfelix commented 2 years ago

@rustbot label -C-tracking-issue