Open michaelwoerister opened 7 years ago
That is, does Rust have the concept of a mut i32 at the type level, for example?
No. mut
is part of reference types, no more. I like your DW_AT_mutable
solution, except we would only include that attribute for DW_TAG_reference_type
s (and possibly only when it is on). Otherwise we have the possiblility of mut i32
being a type.
How to represent fat pointers?
I like both solution (1) and (2), no clear preference.
Beware that traits and slices aren't the only things that can be DSTd, you also have struct MyDST(u8, u8, u8, [u8])
. This is similar to a slice DST but with different offsets. In such cases you also want to generate field names and other debuginfo for the DST itself (unlike slices and traits where we interpret them specially). I think we can handle this somehow by doing whatever C does for trailing T[]
fields and generating debuginfo structs for those.
As an alternative to using DW_TAG_const_type for representing mutability, we could re-use the DW_AT_mutable attribute that is already defined in DWARF.
This seems reasonable to me.
I think we can handle this somehow by doing whatever C does for trailing T[] fields and generating debuginfo structs for those.
C seems to use a TAG_array_type
with a TAG_subrange_type
that doesn't have an AT_count
. Should work for us.
@tromey once suggested for slices that the field entries have no name and the debugger determines which is which by the type (the size is always an integer type, the data is always a pointer type).
For fat pointers I was thinking DW_AT_vtable_elem_location
would be required. I'd rather the decoding be based on DWARF attributes. (If this concept can be extended to slices, so much the better; I was being inconsistent here.)
On the whole I'd rather the output be much more explicit. That is, instead of determining whether something is a slice by checking its name, introduce DW_TAG_slice_type
. Similarly, have a separate tag to represent a trait object -- don't try to reuse whatever is being done for slices. I think it's fine if existing DWARF tags are repurposed in a rust-specific way; but I find it a bit ugly if the tags have their meaning overloaded and require the debugger to encode "excess" knowledge of rust to do its job.
Another questions is: Should fat-pointers (and thin pointers too, maybe) have a DW_AT_byte_size attribute that specifies their size explicitly?
Thin pointers, probably not. Fat pointers, sure. Ideally it should be possible for gdb to create a trait object; though of course we're still a ways away from that.
Beware that traits and slices aren't the only things that can be DSTd
What does "DSTd" mean?
What does "DSTd" mean?
I assume "turned into a dynamically sized type".
I assume "turned into a dynamically sized type".
Aha, thanks. For dynamically sized types, we should just use the standard DWARF stuff. If the length is known then DW_AT_count
can be a location expression (there are other similar ways as well).
What are semantics of DW_TAG_reference_type
and how do they differ from the pointer_type
?
What are semantics of DW_TAG_reference_type and how do they differ from the pointer_type?
DWARF doesn't go into great detail here, but basically the answer is "just like C++". However, in DWARF it is also normal to reuse tags for different things depending on the CU's language; and for Rust I think the obvious answer is that a DW_TAG_pointer_type
should be used for raw pointer types and DW_TAG_reference_type
for ordinary (safe) references. So, a pointer might be null or invalid; but a reference will not be (with the usual caveat that debuggers can sometimes see uninitialized objects, memory can be trashed, etc).
Beware that traits and slices aren't the only things that can be DSTd, you also have struct MyDST(u8, u8, u8, [u8]).
That's a good point. In a way a regular slice is just a special case of a struct with a trailing [T]
. Also (as opposed to C?) in Rust we always know the length of the trailing [T]
because of the fat-pointer field.
So, we have this:
kind | fat |
---|---|
&T (regular reference) |
no |
&[T] (regular slice) |
yes |
&Struct /w trailing [T] |
yes |
&Trait (regular trait object) |
yes |
&Struct /w trailing trait |
yes |
I think it would be nice if &[T]
and &Struct /w trailing [T]
would be represented the same way, and correspondingly also &Trait
and &Struct /w trailing trait
, since in both cases the former is just a special case of the latter (kind of).
Also note there are is also a pointer variant for each of the above. If we have different tags for each of those, we would get quite a few tags:
DW_TAG_reference_type // &T (regular reference)
DW_TAG_slice_type // &[T] (regular slice)
DW_TAG_dst_reference_type // &Struct /w trailing [T]
DW_TAG_trait_object // &Trait (regular trait object)
DW_TAG_trait_object2 // &Struct /w trailing trait
DW_TAG_pointer_type // *T (regular pointer)
DW_TAG_ptr_slice_type // *[T] (regular slice)
DW_TAG_dst_pointer_type // *Struct /w trailing [T]
DW_TAG_trait_object_ptr // *Trait (regular trait object)
DW_TAG_trait_object_ptr2 // *Struct /w trailing trait
This seems a bit excessive to me.
We could also just have a DW_AT_RUST_PTR_KIND
attribute with the possible values of thin, slice, trait
and attach those to either a DW_TAG_reference_type
or a DW_TAG_pointer_type
. The rest of the DIE's fields depends on the kind. That would still be rather explicit, we'd just need one additional attribute.
To give an example of what the DW_AT_rust_ptr_kind
variant could look like:
// &T (regular reference)
DW_TAG_reference_type
DW_AT_rust_pointer_kind thin
DW_AT_mutable true or false // defaults to false if not present
DW_AT_type <ref to type>
// &[T] (regular slice)
DW_TAG_reference_type
DW_AT_rust_pointer_kind slice
DW_AT_mutable true or false // defaults to false if not present
DW_AT_type <ref to type>
DW_AT_object_pointer <expr that yields address of first element>
DW_AT_count <expr that computes count>
// &Struct /w trailing [T]
DW_TAG_reference_type
DW_AT_rust_pointer_kind slice
DW_AT_mutable true or false // defaults to false if not present
DW_AT_type <ref to type>
DW_AT_object_pointer <expr that yields address of struct>
DW_AT_count <expr that computes count>
// &Trait (regular trait object)
DW_TAG_reference_type
DW_AT_rust_pointer_kind trait
DW_AT_mutable true or false // defaults to false if not present
DW_AT_type <ref to type>
DW_AT_object_pointer <expr that yields address of object>
DW_AT_vtable_elem_location <expr that computes address of vtable>
// &Struct /w trailing trait
DW_TAG_reference_type
DW_AT_rust_pointer_kind trait
DW_AT_mutable true or false // defaults to false if not present
DW_AT_type <ref to type>
DW_AT_object_pointer <expr that yields address of object>
DW_AT_vtable_elem_location <expr that computes address of vtable>
In this particular form, we would still have to look at the target type to find out if we have a regular slice or a struct/enum with a trailing [T]
(the same goes for trait objects).
Keep in mind that every new syntax you invent means new things you have to teach tools. GDB and LLDB aren't the only debuginfo consumers out there. So I would strongly lean towards semantics that a C++ tool would understand, even if seems less ideal to your own taste. At least think about how an uninformed tool might interpret your proposed scheme, compared to the status quo.
@cuviper Do you have examples of tools that use type information (as opposed to just line-tables)?
One thing that might be problematic about using DWARF expressions for getting element count/vtable address are optimizations that can pick apart fat pointers, like SROA. For those it might be better to have plain member DIEs? Though I'm not exactly sure how a debugger would handle this: If the value of a variable is calculated via a number of DW_OP_bit_pieces, we will the debugger reconstruct the value before evaluating an expression that takes the value as input?
And my involvement on those means I could also work on adding new Rust semantics to them. I honestly haven't looked closely yet how well they interpret Rust's current DWARF output. I just hope more generally that tools could Just Work as much as possible. :)
If the value of a variable is calculated via a number of DW_OP_bit_pieces, we will the debugger reconstruct the value before evaluating an expression that takes the value as input?
Wouldn't it have to reconstruct it? I don't see what else would make any sense.
(And if optimizations make some of this inaccessible, so be it. An -Og
might be more conservative.)
@cuviper Yes, that's a good point. We'll want to strike a balance between not doing everything differently from everybody else and doing things in a way that are a good fit for Rust.
Regarding DW_OP_bit_piece
, you're probably right. So using expressions wouldn't be much of a problem for optimizations. Using member DIEs would still be a good idea, I guess, because that's a very stable way of encoding things, very easy to make sense of for every tool.
So I would strongly lean towards semantics that a C++ tool would understand, even if seems less ideal to your own taste
The counterpoint here is stuff like the existing representation of Rust enum types, which requires significant decoding in the debugger. In fact some new cases were just implemented this week. This is one reason I think it's better to just add new tags, along with helper attributes to describe things more precisely.
I do agree that reusing existing tags makes sense when possible.
Maybe this part of the discussion would be improved if it were more specific. For instance, how would you propose handing the cases under discussion here?
If the value of a variable is calculated via a number of DW_OP_bit_pieces, we will the debugger reconstruct the value before evaluating an expression that takes the value as input?
Yes, gdb does this already. I implemented it (:-) when gcc added debuginfo for SRA.
The counterpoint here is stuff like the existing representation of Rust enum types, which requires significant decoding in the debugger.
Yeah, both ADTs and DST are Rust-specific types that don't have a C++ analogue, and pretending to a C++ type for the sake of tooling will probably mean that the tools won't display the right thing anyway.
Yeah, both ADTs and DST are Rust-specific types that don't have a C++ analogue, and pretending to a C++ type for the sake of tooling will probably mean that the tools won't display the right thing anyway.
I think we can achieve a lot of backwards compatibility (or easy portability) if we use the standard tags and attributes (like DW_TAG_member, DW_TAG_byte_size, etc) like everyone else does.
Maybe this part of the discussion would be improved if it were more specific. For instance, how would you propose handing the cases under discussion here?
Can you elaborate on what you mean exactly?
I think the request to be more specific was aimed at me. :) And... I'll have to think on it. But it sounds like we're all agreeing not to stray too far. It looks like the proposal for thin &T
would already work just fine for a tool that knows C++ references, at least, so that's good. If all those fat pointers are currently opaque to tools, then finding a new meaningful representation is fine.
Should fat-pointers (and thin pointers too, maybe) have a DW_AT_byte_size attribute that specifies their size explicitly?
On this point in particular, I don't think thin pointers need it, as @tromey said. I think it would be very helpful for fat pointers though, if nothing else just to raise a flag to the tools that it's abnormal.
@tromey I found a message on gdb-patches which describes an ADA "unconstrained array" fat pointer. It's not the same layout as a Rust slice, but I think the same concepts could apply. What do you think of that representation? https://sourceware.org/ml/gdb-patches/2014-08/msg00310.html
So a similar Rust &[T]
would be something like:
DW_TAG_array_type
DW_AT_mutable true or false // defaults to false if not present
DW_AT_type <ref to type>
DW_AT_data_location <expr that yields address of first element>
DW_TAG_subrange_type
DW_AT_type <ref to type>
DW_AT_count <expr that computes count>
I suspect this will look more familiar to tools that already know VLAs.
In any case, I think data_location
is probably a better fit where object_pointer
was proposed earlier.
ADA "unconstrained array"
For this particular representation, I think the issue is that there's no obvious way to dynamically construct an instance. However, that's a reasonable thing to want to do. In fact right now gdb does it, though by baking in some knowledge of the Rust ABI -- but avoiding this is one of my goals. (Another important goal being winding up with something we can document and attempt to get into DWARF 6.)
I've been giving this topic some thought tonight and I have a number of issues to raise, which in my mind generally point to the usefulness of adding new tags where needed; though naturally I value your insights.
This is a bit unsorted it turns out. Maybe this isn't an ideal forum for this sort of discussion.
__0
, __1
, etc -- but I think a (weird) Rust program could use these names in a struct
.case
and mapping the new tag to the old construct. Though to be fair this "logic" is maybe wonky.DW_AT_data_location
or DW_AT_count
. (My belief is that it's rare for consumers to implement all of DWARF; but rather it's more normal that they implement the subset of DWARF that the authors cared about or could find at the time of writing; which is sensible given both history and the reality of compilers.) But if SystemTap must be modified, why not modify it following a cleaner (according to me...) plan?impl
of types and of traits for types. IIRC the obvious DWARF-like approach here was found to make LLDB complain because it didn't expect methods to be attached to base types.I think the current approach could be described as "keep it close-ish to C++ and hope the tools are ok". I found this pretty inadequate for gdb, and I suspect for lldb in the end the only answer will be a more full port. There are just too many differences and they are accumulating.
I think it's also important to note that we won't be able to come up with an encoding that is just understood by existing tools. The current approach has the goal of not crashing existing tools while providing enough information for pretty printers to have some minimal functionality. I think we have reached the limits of this approach and we'll need to make breaking changes going forward anyway.
I think we should just choose clean encodings that don't do anything fancy. That should help existing tools to add support with minimal effort.
I've been thinking more about this, and have come around to see it's not so horrible for Rust to invent new syntax (tags/attrs) for things that are truly unique. However, I think we should avoid overloading existing constructs in surprising ways. Namely, DW_TAG_reference_type
is a good fit for &T
thin references, and most existing tools should already do the right thing there. But I think the fat references should use a distinct tag, or even separate distinct tags for each, e.g. DW_TAG_RUST_slice
and DW_TAG_RUST_trait_object
.
(I don't know if the standard says anything about this, but I like having CAPS prefixes on non-standard extensions.)
@cuviper It looks like DWARF information emitted by Rust still doesn't give any hints to distinguish between &
, &mut
, *const
and *mut
:
0x000000c1: TAG_pointer_type [4]
AT_type( {0x000000ca} ( u32 ) )
AT_name( "*const u32" )
0x000000ca: TAG_base_type [5]
AT_name( "u32" )
AT_encoding( DW_ATE_unsigned )
AT_byte_size( 0x04 )
0x000000d1: TAG_pointer_type [4]
AT_type( {0x000000ca} ( u32 ) )
AT_name( "*mut u32" )
0x000000da: TAG_pointer_type [4]
AT_type( {0x000000ca} ( u32 ) )
AT_name( "&u32" )
0x000000e3: TAG_pointer_type [4]
AT_type( {0x000000ca} ( u32 ) )
AT_name( "&mut u32" )
Is this the only relevant issue / discussion or has there been some progress tracked elsewhere perhaps?
I guess I can just use prefix of AT_name
to distinguish between them for now, but it seems quite hacky.
I guess I can just prefix of AT_name to distinguish between them for now, but it seems quite hacky.
That's what gdb does and what I plan to do in lldb, at least in the short run. Longer term I think we should use DWARF tags to differentiate, as discussed here.
@tromey wrote (https://github.com/rust-lang/rust/issues/37504#issuecomment-257434920):
Aha, thanks. For dynamically sized types, we should just use the standard DWARF stuff. If the length is known then
DW_AT_count
can be a location expression (there are other similar ways as well).
@cuviper wrote (https://github.com/rust-lang/rust/issues/37504#issuecomment-257716268):
I found a message on gdb-patches which describes an ADA "unconstrained array" fat pointer. It's not the same layout as a Rust slice, but I think the same concepts could apply.
Three years later, I was looking through the DWARF5 spec in case there's anything potentially useful, and came across this Fortran example (page 320, "Figure D.13"):
10$: DW_TAG_array_type
DW_AT_type(reference to real)
DW_AT_rank(expression=
DW_OP_push_object_address
DW_OP_lit<n> ! offset of rank in descriptor
DW_OP_plus
DW_OP_deref)
DW_AT_data_location(expression=
DW_OP_push_object_address
DW_OP_lit<n> ! offset of data in descriptor
DW_OP_plus
DW_OP_deref)
11$: DW_TAG_generic_subrange
DW_AT_type(reference to integer)
DW_AT_lower_bound(expression=
! Looks up the lower bound of dimension i.
! Operation ! Stack effect
! (implicit) ! i
DW_OP_lit<n> ! i sizeof(dim)
DW_OP_mul ! dim[i]
DW_OP_lit<n> ! dim[i] offsetof(dim)
DW_OP_plus ! dim[i]+offset
DW_OP_push_object_address ! dim[i]+offsetof(dim) objptr
DW_OP_plus ! objptr.dim[i]
DW_OP_lit<n> ! objptr.dim[i] offsetof(lb)
DW_OP_plus ! objptr.dim[i].lowerbound
DW_OP_deref) ! *objptr.dim[i].lowerbound
DW_AT_upper_bound(expression=
! Looks up the upper bound of dimension i.
DW_OP_lit<n> ! sizeof(dim)
DW_OP_mul
DW_OP_lit<n> ! offsetof(dim)
DW_OP_plus
DW_OP_push_object_address
DW_OP_plus
DW_OP_lit<n> ! offset of upperbound in dim
DW_OP_plus
DW_OP_deref)
DW_AT_byte_stride(expression=
! Looks up the byte stride of dimension i.
...
! (analogous to DW_AT_upper_bound)
)
There is also an earlier example in Appendix D that might be simpler in terms of Fortran features it describes, but is longer so I'm not going to paste it here.
Overall, it looks like DWARF is designed to support fully dynamic multidimensional arrays and slices, which is more powerful than Rust needs.
Given the DWARF5 spec, its examples, and the comments from years ago in this thread, I believe we may have a path forward if we choose to go down that route.
The main problem I see, for handling slices like this (assuming LLVM and debuggers support the necessary features), is that DW_OP_push_object_address
has to push the address of the wide pointer (&[T]
or *[T]
), not the value of the data pointer, meaning it doesn't compose with DWARF pointer/reference types.
And for &(A, B, [T])
, there is no &[T]
in memory, and I can't think of any nice way of propagating the slice length all the way down to it.
This is a very old and long thread and it's been a while since I looked at details, but I'd like to point out that
That is, does Rust have the concept of a mut i32 at the type level, for example?
No. mut is part of reference types, no more.
it not entirely true, or at least, not any different from the situation in C / C++.
Aside from references, Rust also has mutable and immutable variables, parameters and so on, just like C / C++ does. So when one says that Rust doesn't have mut i32
at the type level, same can be said about const uint32_t
at the type level in C / C++, because in both cases they describe the actual slot, pointer or a reference and not the value itself.
And yet, even though C / C++ has comparable type semantics, it already has an established DWARF representation for these different types - by using the earlier mentioned "constifying newtypes".
One thing that was brought up and still remains true is that for Rust such representation is potentially more wasteful, because immutable types in Rust are much more popular than in C / C++ due to the flipped defaults.
This might still be true, but on the other hand DWARF representation is fairly compact, and it would be worth measuring first whether introducing a new attribute really saves any noticeable amount of space compared to a separate type ref (which is essentially just a type tag + a reference to the inner type).
For now, it would be great to unblock this issue and implement at least the suboptimal-but-already-supported-in-most-tools representation for immutable vs mutable references, and then we can iterate on it in future PRs.
@rustbot label -C-tracking-issue
Currently, we represent thin references and pointers with
DW_TAG_pointer_type
DIEs and fat pointers (slices and trait objects) asDW_TAG_struct
DIEs with fields representing payload and metadata pointers. This is not ideal and with debuggers knowing about Rust, we can do better. The question is, what exactly do we want the representation for these kinds of types to look like.Some things seem pretty straightforward to me:
DW_TAG_reference_type
DIEs.DW_TAG_pointer_type
DIEs.But beyond that, there are some decisions to be made:
(1) How do we represent mutability?
The C++ version of DWARF represents a const pointer like
const char *
with three separate type entries:I think this is a bit verbose and I'm not sure it is entirely appropriate for Rust. Do we really have
const
andmut
types? That is, does Rust have the concept of amut i32
at the type level, for example? I mean there are mutable and immutable slots/memory locations and we have "mutable" and "shared" references, but those two things seem kind of different to me.As an alternative to using
DW_TAG_const_type
for representing mutability, we could re-use theDW_AT_mutable
attribute that is already defined in DWARF. In C++ DWARF it is used formutable
fields. We could use it for reference type and local variable DIEs:(2) How to represent fat pointers?
The pointer types in C/C++ DWARF don't have
DW_TAG_member
sub-DIEs, since they are always just values. Fat pointers in Rust are different: they have one field that is a pointer to the data, and another field that holds additional information, either the size of a slice or the pointer to a vtable. These need to be described somehow. I see a few options:DW_TAG_pointer_type
orDW_TAG_reference_type
DIE with two fields that are described byDW_TAG_member
sub-DIEs, both having theDW_AT_artificial
attribute. @tromey once suggested for slices that the field entries have no name and the debugger determines which is which by the type (the size is always an integer type, the data is always a pointer type). This could also be extended for trait objects, since the data pointer will always be a pointer to a trait and the vtable-pointer will always be something else.DW_TAG_slice_type
DIE that follows the encoding above and borrow some other attributes for trait objects: aDW_AT_vtable_elem_location
attribute holds the offset of the vtable field within the fat-pointer value, and aDW_AT_object_pointer
attribute does the same for the data pointer. This is distinctly not how these attributes are used in a C++ context but it would be a nice fit, I think.DW_AT_object_pointer
indicating data pointer fieldAnother questions is: Should fat-pointers (and thin pointers too, maybe) have a
DW_AT_byte_size
attribute that specifies their size explicitly?cc @tromey, @Manishearth See also https://github.com/rust-lang/rust/issues/33073