Proposal: Support for memory access and disassembly

andrewcrawley commented 5 years ago

This proposal covers both disassembly and memory access, since they both depend on the same underlying primitive - a "memory reference".

Memory references

A memory reference is an opaque identifier that can be used (along with an optional byte offset) to identify a single location in memory. An adapter has the option of providing a memory reference on several protocol objects that represent things a frontend might want to display in a memory view or use as a starting point for disassembly:

export interface EvaluateResponse {
    // ...

    /**  Memory reference to an adapter-determined location appropriate for this result.  For pointer types, this is generally a reference to the memory address contained in the pointer. */
    memoryReference?: string;
}

export interface Variable {
    // ...

    /** Memory reference for the variable if the variable represents executable code, such as a function pointer. */
    memoryReference?: string;
}

export interface StackFrame {
    // ...

    /** Memory reference for the current instruction pointer in this frame. */
    instructionPointerReference?: string;
}

export interface GotoTarget {
    // ...

    /** Memory reference for the instruction pointer value represented by this target. */
    instructionPointerReference?: string;
}

Memory references are expressed as strings rather than numbers because the most common implementation will likely be to use the address directly, and a 64-bit memory address will not fit in a JSON number.

Reading memory

Reading memory from a location represented by a memory reference and offset is accomplished via a new readMemory request:

export interface Capabilities {
    // ...

    /** The debug adapter supports the 'readMemory' request. */
    supportsReadMemoryRequest?: boolean;
}

/** Reads bytes from memory at the provided location. */
export interface ReadMemoryRequest extends Request {
    // command: 'readMemory'
    arguments: ReadMemoryArguments;
}

/** Arguments for 'readMemory' request. */
export interface ReadMemoryArguments {
    /** Memory reference to the base location from which data should be read. */
    memoryReference: string;

    /** Optional offset (in bytes) to be applied to the reference location before reading data.  Can be negative. */
    offset?: number;

    /** Number of bytes to read at the specified location and offset. */
    count: number;
}

/** Response to 'readMemory' request. */
export interface ReadMemoryResponse extends Response {
    body: {
        /** The address of the first byte of data returned.  Treated as a hex value if prefixed with '0x', or as a decimal value otherwise. */
        address: string;

        /** The number of unreadable bytes encountered after the last successfully read byte.  This can be used to determine the number of bytes that must be skipped before a subsequent 'readMemory' request will succeed. */
        unreadableBytes?: number;

        /** The bytes read from memory, encoded using base64. **/
        data: string;
    }
}

Note that the read memory request should be considered to have succeeded even if none of the requested bytes are available. For instance, a read might extend into a page that is not mapped - in this case, the adapter would be expected to return the bytes it could read (if any), along with a value for unreadableBytes that can be used to skip the unmapped page.

Additionally, it is intended that a zero-byte read (that is, a readMemory request with a count of zero) could be used to determine the actual address identified by the memory reference and offset.

Disassembly

Disassembling code at a location represented by a memory reference is accomplished via a new disassemble request:

export interface Capabilities {
    // ...

    /** The debug adapters supports the 'disassemble' request. */
    supportsDisassembleRequest?: boolean;
}

/** Disassembles code stored at the provided location. */
export interface DisassembleRequest extends Request {
    // command: 'disassemble'
    arguments: DisassembleArguments;
}

/** Arguments for 'disassemble' request. */
export interface DisassembleArguments {
    /** Memory reference to the base location containing the instructions to disassemble. */
    memoryReference: string;

    /** Optional offset (in bytes) to be applied to the reference location before disassembling.  Can be negative. */
    offset?: number;

    /** Optional offset (in instructions) to be applied after the byte offset (if any) before disassembling.  Can be negative. */
    instructionOffset?: number;

    /** Number of instructions to disassemble starting at the specified location and offset.  An adapter must return exactly this number of instructions - any unavailable instructions should be replaced with an implementation-defined 'invalid instruction' value. */
    instructionCount: number;

    /** If true, the adapter should attempt to resolve memory addresses and other values to symbolic names. */
    resolveSymbols?: boolean;
}

/** Response to 'disassemble' request */
export interface DisassembleResponse extends Response {
    body: {
        /** The list of disassembled instructions. */
        instructions: DisassembledInstruction[];
    }
}

/** Represents a single disassembled instruction. */
export interface DisassembledInstruction {
    /** The address of the instruction.  Treated as a hex value if prefixed with '0x', or as a decimal value otherwise. */
    address: string;

    /** Raw bytes representing the instruction and its operands, in an implementation-defined format. */
    instructionBytes?: string;

    /** Text representing the instruction and its operands, in an implementation-defined format. */
    instruction: string;

    /** Name of the symbol that correponds with the location of this instruction, if any. */
    symbol?: string;

    /** Source location that coresponds to this instruction, if any.  Should always be set (if available) on the first instruction returned, but can be omitted afterwards if this instruction maps to the same source file as the previous instruction. */
    location?: Source;

    /** The line within the source location that corresponds to this instruction, if any. */
    line?: number;

    /** The column within the line that corresponds to this instruction, if any. */
    column?: number;

    /** The end line of the range that corresponds to this instruction, if any. */
    endLine?: number;

    /** The end column of the range that corresponds to this instruction, if any. */
    endColumn?: number;
}

The location, line, column, endLine and endColumn members on DisassembledInstruction are intended to provide a mapping between a disassembled instruction and the source code it was generated from, if this information is available to the adapter.

Formatting

A debugger UI that implements this proposal will likely want to display addresses differently for 32-bit vs 64-bit processes. This information would be conveyed via an additional property on the process event:

export interface ProcessEvent extends Event {
    // ...

    /** The size of a pointer or address for this process, in bits. */
    pointerLength?: number;
}

CC: @gregg-miskelly

andrewcrawley commented 5 years ago

I made a slight tweak to the proposal based on my experience implementing this in vsdbg:

Added an offset parameter to the disassemble request
Updated some of the comments to better describe the expected behavior of an adapter

haneefdm commented 5 years ago

@andrewcrawley Yes, memoryReference has to be a string. But wondering if the proposal should say what the string contents should be for parsing/formatting purposes. For instance a decimal or with an 0x prefixed hex number or either?

Nit. Should ReadMemoryResponse.body.address be named ReadMemoryResponse.body.memoryReference just to indicate it follows the same rules as other memoryReference? Same thing with DisassembledInstruction.address Perhaps your intent was different?

haneefdm commented 5 years ago

@andrewcrawley I hope I am not offending you with my opinions. Your knowledge probably dwarfs mine

I was wondering if the DisassembleResponse would become too heavy for transport. I am imagining stupidly large functions (generally auto-generated) or entire objects (elf-sections/obj/exe/dll/so). One reason to show more than a function is to see instructions as well as const data packed around them.

Please visit -data-disassemble section in this link

I am not saying gdb is the gold standard. But, it kinda makes sense. Especially with -a option. The debugger was paused, I want to get the whole context around that instruction with the DisassembleRequest API.

If we just follow the gdb MI interface, it is a lot less burdensome for the MIEngine (mostly pass it on), less burdensome on the client to figure out what kind of request to make. Let the clients deal with the rest -- presentation, debug-interface, etc.

What you are proposing is an interpreted and decoded super nice interface. What I was thinking was a mostly raw interface taking advantage of what the debugger already provides.

Once we feel comfortable with the proposals, I can prototype some stuff.

PavelSosin commented 5 years ago

Affordable only if you have either in-line interpreter or incremental compiler. There are only few exist This is a kind of rare server capability.

puremourning commented 5 years ago

As a client implementer, I would say this API proposal is very sensible.

The only bit that would be fiddly is this:

Should always be set (if available) on the first instruction returned, but can be omitted afterwards

This makes handling stateful. I think we should either:

require it is always sent, or
push it up a message level and require all instructions to be supplied file-wise

haneefdm commented 5 years ago

My thinking was that all the location info should be in its own bag and it is either there or not there, ie not required

PavelSosin commented 5 years ago

The purpose of such protocols is decoupling UI from debugger service implementation. This is achieved by LSP. This proposal gives some hint how memory reference can be expressed by UI. Disassembly is generally impossible for languages with bytecode runtime and jit compilers. like C#, java and javascript. They are 90% of world today. Imagine basic Cloud use case: Debugger UI running in browser as a part of IDE uses DAP to communicate with Debugger service running as Cloud container based on DotNet runtime. The source code is loaded by demand from the user's workspace via workspace Rest API.

haneefdm commented 5 years ago

@PavelSosin I don't even understand your objections. This proposal does not affect those languages or their debuggers or even add any overhead to you or your UI.

puremourning commented 5 years ago

They are 90% of world today.

Even if that were true, those in the 10% build the platforms to run the 90% on. And they require good debuggers.

PavelSosin commented 5 years ago

Transfer of few kilobytes of disassembly result to the client side is not a serious boundary condition. The time can be estimated as download of text file of the same size from good site deployed to good cloud infrastructure. Literally, every developer today can watch Youtube and never use modem connection. Since DAP protocol is built on top JsonRPC over WebSocket thereof it is non-blocking protocol the 99% of time will be spent by dissasembler and not more than 1% of UI latency will be spent by communication and rendering. We never had issues related to communication speed in our LSP adoption projects.

PavelSosin commented 5 years ago

To avoid UI freeze the response on memory access and disassembly request should not contain data but only primary result of memory access or disassembly request. Upon receiving positive response UI displays control to expose data which will be received in the next step. Next step is getXXXData request. The data got as response of this request fills pre-opened control with result. The notifications sent from server to client informs Debugger UI about progress and any issues detected during disassembly. If disassembly request sent from UI to server contains data which prevents disassembly server responses with error response and UI displays error message instead of opening control. Another option is displaying disassembly result in-line but it will require integration of debbuger server with editor back-end framework. LSP Code completion cycle is a good example of such 2-step conversation.

PavelSosin commented 5 years ago

It looks like I share my conclusions with https://kichwacoders.com/2017/08/02/gdbs-mi-is-not-a-debug-protocol/

andrewcrawley commented 5 years ago

@haneefdm: Absolutely no offense taken, please continue providing feedback! : )

The protocol intentionally does not specify the format of a memoryReference - it's intended to be an opaque identifier, the format of which is entirely up to the debug adapter. It could be a raw address in hex or decimal, or it could be an index into some internal adapter data structure, or a handle value, or a "segment:offset" value, or anything else. If the adapter provides a memoryReference as part of an evaluate response, for instance, the only thing the UI can do with that reference is to pass it back, unchanged, as part of a readMemory or disassemble request. Nothing in the UI should ever be attempting to parse or assign meaning to them.

The use of address on the response types is also intentional - these are intended to be actual addresses displayed to the user alongside the contents of memory or the disassembled instructions: The UI may need to parse them for display, so the format is defined by my proposal.

Finally, the intended use of this API is to support a VS-like experience. I think a MI-style option of saying "disassemble this function" (as with the -a argument to -data-disassemble) is a good experience for a text-based UI where you want to see the entire context at once without having to enter a bunch of commands. In VS, however, you pick a location to start disassembly, and the debugger UI requests a few pages worth of instructions on both sides of this location to initially populate the UI. As the UI is scrolled in either direction, more instructions are requested as necessary, and if the optional location fields are populated, these instructions are even shown in-line with the source.

@puremourning: You can populate the location field on every DisassembledInstruction if you don't want to track state, it's just not strictly necessary. Source objects are kind of bulky (especially when they contain checksum data), so I want to minimize protocol overhead by allowing them to be omitted whenever possible.

haneefdm commented 5 years ago

@andrewcrawley Got it. And, thanks for such a detailed explanation. Yes VS was my reference for presentation. I did one of those lazy rendering/reading things before and that is what I had in mind anyway. Me, as the client of the DA would be careful, just wondered about other use cases.

puremourning commented 5 years ago

You can populate the location field on every DisassembledInstruction

I am writing the client, so I don’t get to choose what adapters do. I understand the source being bulky so I was sorta thinking we could bump it up a level. Still 1x per message, but not at the instruction level.

haneefdm commented 5 years ago

Protocol: Correct me if I am wrong folks. This is just a specification change proposal formalizing how a request and response is supposed to be formatted/exchanged. Optional/mandatory content also specified. It is not a directive for a Debug Adapter (DA) or its client. The DA makes its choices hopefully with the clients and clients make their own choices. For C/C++ the DA for cppdbg/cppvsdbg is the MIEngine and we have to go there to see what it's going to implement.

For a given instruction, I see only two required things. Address and instruction. Makes sense if the instruction bytes are optional. I am thinking of variable sized instructions. If instruction bytes are optional, then you need to know the addresses of every instr.

That brings me to the Source field/property which itself is pretty heavy duty. I wish that there was a level of indirection here. If every unique Source item had a source ID, we could just reference the ID. We could do something like the following

/** number represents an id for a previously described source element */
location?: Source | number ; /* number is an integer */

Maybe not a worry? Already taken care of? If not, it might require a change to the existing Source spec. IDs could be optional and it's between the DA and the client to deal with it on a per session basis - transient. Or, is Source.sourceReference is meant for this kind of thing?

Looking at the json, there might be compatibility issues as well, so we have to be careful, but doable

Enough said I need to research how source maps work with various debuggers.

PavelSosin commented 5 years ago

Hi Mohammed, You sent me reference to the document which demonstrates very cool gdb capability but the problem is not in gdb capability. The problem is how generic protocol can be constructed. You can find several already existing or under development debugging servers which implements DAP protocol. two of such debugging servers you can find running on your computer: Chrome debugger and Edge debugger. Both of these wide spread debuggers provide some features which are not a part of standard protocol because they are usable only for javascript and typescript debugging. The example is typescript source mapping. The most widespread language which produces native executable code is Go. Go Debugging server exists and is used by Go developers. But Go doesn't allow neither referencing CPU registers nor disassembly. Who is the developers which are ready to pay for such feature as integral part of IDE ? The focus of Microsoft, RedHat and SAP today is Cloud based development environment. IDE is provided as PAAS by serverless Cloud services. LSP and DAP protocols serve this goal. It means that developer doesn't own or manage hardware on which debugger service used in his/her IDE. It the best case he /she knows that the debugger server runs in Unix VM running on some abstract hardware node. The only way to access operational environment of this server is SSH which anyway doesn't allow direct memory modification due to security reasons. In the case when application requires some special hardware like nVidea boards or SAP HANA DB user can demand to run his server only on the nodes which a labeled as having special hardware capabilities. Developers like me who develop applications which use HANA knows nothing about HANA appliance hardware architecture. All job of creation Docker container for HANA was made by separate team which don't use generic IDE or generic protocols I suppose they use HANAWebIDE Web development environment. For development nVidea Docker container developer needs CUDA. THE CUDA input language is C++ but there is nothing common between CUDA and gdb. In other words it is very hard to find use case which justify such very specific protocol extension which doesn't look as redundant development resources burden to achieve certification for developed debugging server and IDE frontend. This pleasure may cost big money. BR Pavel From: Haneef Mohammed notifications@github.com Sent: Sunday, May 19, 2019 11:51 AM To: microsoft/debug-adapter-protocol debug-adapter-protocol@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [microsoft/debug-adapter-protocol] Proposal: Support for memory access and disassembly (#42)

@andrewcrawleyhttps://github.com/andrewcrawley I hope I am not offending you with my opinions. Your knowledge probably dwarfs mine

I was wondering if the DisassembleResponse would become too heavy for transport. I am imagining stupidly large functions (generally auto-generated) or entire objects (elf-sections/obj/exe/dll/so). One reason to show more than a function is to see instructions as well as const data packed around them.

Please visit -data-disassemble section in this linkhttps://sourceware.org/gdb/onlinedocs/gdb/GDB_002fMI-Data-Manipulation.html#GDB_002fMI-Data-Manipulation

I am not saying gdb is the gold standard. But, it kinda makes sense. Especially with -a option. The debugger was paused, I want to get the whole context around that instruction with the DisassembleRequest API.

If we just follow the gdb MI interface, it is a lot less burdensome for the MIEngine (mostly pass it on), less burdensome on the client to figure out what kind of request to make. Let the clients deal with the rest -- presentation, debug-interface, etc.

What you are proposing is an interpreted and decoded super nice interface. What I was thinking was a mostly raw interface taking advantage of what the debugger already provides.

Once we feel comfortable with the proposals, I can prototype some stuff.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/microsoft/debug-adapter-protocol/issues/42?email_source=notifications&email_token=AFA2UFGUAFVDUZZ7ACSYJVTPWEIGFA5CNFSM4HCNW6AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVW5WZA#issuecomment-493738852, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFA2UFENZWKR5YFIVQ2FGF3PWEIGFANCNFSM4HCNW6AA.

weinand commented 5 years ago

@PavelSosin it has already been said many times, but it seems that I have to say it again:

The proposal for "memory access", "disassembly", and "register" oriented parts of the Debug Adapter Protocol are controlled by "Capabilities" and completely optional. A Debug Adapter is free to opt-into these feature but it does not have to.

If the Go debugger doesn't allow referencing CPU registers or memory, no problem! By default no DAP-frontend will try to use these features because the corresponding Capabilities are disabled by default.

And BTW, we (the VS Code team) haven't even decided whether we want to support "memory access", "disassembly", and "register" in VS Code at all (for some of the reasons you already mentioned).

But other DAP clients (e.g. Eclipse or VIM), might want to use these features (because they might target a different audience). So your personal opinions about "Cloud based development environment" are true for some developers, but not for all (that's the reason why they are personal opinions). Personal opinions are perfectly fine, but they cannot be used as a guiding principle for open protocol design.

Since we will have to take a broader view in DAP design, I suggest that we do not continue this part of the discussion and focus more on the technical issues. By this we can easily save some "big money" too ;-)

PavelSosin commented 5 years ago

I feel we are closer to agreement. I think we should involve kichwacoders They work in the same direction and already made some practical steps. They see limitations very well.

haneefdm commented 5 years ago

@PavelSosin Could you please refrain from re-posting entire posts. It is distracting and the thread becomes hard to follow. If you want to address me directly, then just say @haneefdm Quoting a few lines is okay but not entire posts!! Not sure about the etiquette here. As I am typing this I know this post is annoying and distracting to everyone. I apologize.

You can link to a particular post as well your post

haneefdm commented 5 years ago

@andrewcrawley Maybe, I am getting a better picture for disassemby. So, my assumed usage model could be

The DA could send a Source object for a given instruction along with line numbers, etc.
Following instructions can omit the Source but still, provide a context change (deltas) info (line/column detail) and it is relative to the previous Source object
There can be many instruction packets that can have no Source or line/column info because they still refer to the last specified context.
It is the DA's responsibility to update contexts incrementally or switch contexts completely by doing kind of a reset: respecify the Source
It is the client's responsibility to follow along or ignore the whole thing. It only gets this response because it made a request
The first response instruction for a request packet always contains the full context if available. Rest are relative to that until a reset.

Q1: Maybe this is already addressed but how to handle a NULL source. As in we got instrs, but no source info. Is there a way to say that this and future instructions will have no source context?

On both the DA and client, it is kinda stateful (not my preference) but the state is per transaction (request/response). A particular DA can always send all info all the time like it was mentioned by @andrewcrawley

That is super awesome if I am right in my assumptions. My worries about payload size go away.

This signal to noise ratio is too high on this thread. I hope you read this message and sorry for contributing to the SNR.

andrewcrawley commented 5 years ago

@haneefdm: Your assumptions match mine, except for part of your third point - if the location field is left out, we infer it to be same as in the previous instruction, but if the line / column info is left out, we do not infer anything. Both a location (either explicitly specified, or inferred from a previous instruction) and a line / column are necessary for source mapping, so you can handle the "unknown source" case by leaving out the line / column / location.

andrewcrawley commented 5 years ago

Update: Added an is64Bit property to the process event. UIs could use this to determine the formatting of user-visible addresses.

puremourning commented 5 years ago

Is64bit seems to incur an assumption about word lengths. Better to have a wordLength in bits?

andrewcrawley commented 5 years ago

@puremourning: Yeah, good idea, it would be nice to be able to format addresses correctly even if they're not 32 or 64 bits wide. I'll update the proposal accordingly, thanks! I'd like to avoid any confusion about whether the word size of a modern Intel CPU is 16, 32, or 64 bits, so I'll call the field "pointerLength".

haneefdm commented 5 years ago

maybe pointerLengthBits

PavelSosin commented 5 years ago

Do you intend to use absolute addresses? When I debug my application running in VM such feature can be exploited only by hackers. In the real world all my memory references are symbolic based on either on stack frame or variable pointer. Absolute address is not compatible with Variables pane structure. For hardware debugging you don't need any protocol. You will never share the same UI. Hardware debugger has to have both parts on the same table. Hardware debuggers are provided internally by hardware manufacturers. For Intel or Arm developers number of bits is not a question. If I'm IoT device developer I use java or Kotlin. Every proposal should solve certain issue in certain non-hypothetical use case

PavelSosin commented 5 years ago

The lowest level programming languages - assembly unaware about pointer length. The core of Debugger UI is the same editor Window containing source of the debuggee. I think nobody has explanation how this source can be written with machine word awareness. 32 bit Intel assembly is the same as 64 bit assembly. This is absolutely pointless discussion.

weinand commented 5 years ago

@PavelSosin please do not assume the VS Code debugger UI in your comments (e.g. "Absolute address is not compatible with Variables pane structure."). DAP is independent from VS Code and works with all tools (even those that you are not aware of).

In addition people participating in these discussions have "real world experience" because they have developed debuggers for a long time. So they know what they are talking about.

You might disagree with them, that's fine. But statements like "This is absolutely pointless discussion." are not acceptable here.

I suggest that you read our Code of Conduct rules before you post your next comment.

weinand commented 5 years ago

@andrewcrawley here is my first batch of feedback:

Memory references:

I like the concept of an opaque "memoryReference", but I suggest to make it controllable by the client:

In the proposal the "memoryReference" is returned by four existing requests which are heavily used in every debug session. Assuming that there is some cost for generating, managing, and transmitting these references we should make the inclusion of the "memoryReference" controllable by a client capability. So a client would announce in the Initialize request that he is interested in "memoryReferences".

Reading memory:

This "Reading memory" proposal is interesting because it would be for the first time that DAP is used to transfer raw binary data instead of formatted UI strings. This requires that the client becomes responsible for formatting binary data (which was not necessary in all other areas of DAP).

It is clear that I am a bit reluctant to add new concepts into the DAP. Did you consider other approaches to solve this problem? E.g. you could do the formatting in the DA and just return a text document (via DAP's "source" request) that a client could easily display.

Disassembly:

"Disassembly" is somewhat similar to "Memory" in that it is the client's responsibility to format some fairly detailed model (array of DisassembledInstruction) into some textual representation. So I have the same reservation as for the "Reading memory" proposal.

But before I can provide more feedback, I have additional questions:

Is disassembling binary data into instructions really something the DA has to do? Is the debugger involved when disassembling binary data? Would it be possible to use the "Reading memory" proposal from above to get the binary data but then create the "DisassembledInstructions" outside and independent from the DA? So bottom line: is disassembly really in the scope of a DA?
And now a question going into the opposite direction: should it be possible to set breakpoint in the disassembly? If yes, then the best approach to achieve this is by creating a "virtual document" with the disassembly inside the DA and making it available via the DAP "source" request (since breakpoints are specified via a source and a line/column offset). Trying to create the disassembly document in the client (by formatting the DisassembledInstruction) makes it more difficult to have it available via a "Source" and to set breakpoints in it.

haneefdm commented 5 years ago

For reading memory, interpreting the bytes is best left to the client IMHO. It could be a specific structure that the client knows how to deal with (TCP packet), or UI things like number of columns, what happens when the window is resized, scrollbars/actions I would use, kind of UI element I want to use, I may do byte level (hex + printable ascii), I can show as other formats too uint32, uint64, signed, unsigned, unicode, etc. Or maybe this memory read is actually reading a memory mapped peripherals registers.

I think sending raw data at byte granularity is the best (model) and the view can present in many different ways. That is precisely what the request was ... read-memory. We don't know why that request was made. No problem with base64 encoding either. I would have left it as hex (padding issues) ... base64 is a bit more efficient for transport

About disassembly: In this case, the proposal places more burden on the DA, but then the client can display any which way. as a table or formatted text. It gets normalized to one form regardless of the actual debugger by the DA. The disassembly window has other uses where it can also allow manipulation of breakpoints, for which it is nice to have the raw info decoded by the DA

I am on both ends of this, DA and the client, so this makes it a bit hard. Plus I am also an end user.

haneefdm commented 5 years ago

About getting memory references - I want to think about the rationale for each, potential use cases, feasibility - just brainstorming

EvaluateResponse: does it only make sense unless it is also a direct simpleVariable? And, not every variable even has an address? Evaluate can be used for all kinds of things -- even raw requests passed on to the debugger? Could be wrong, it is used by hover, REPL, and maybe other places. My thinking it will not be easy for the DA to provide a memoryReference without figuring out what kind of evaluate it is. One place it might make sense it a variable currently in scope/frame ... I don't know
Variable: Yes, might make sense but I am thinking not needed. You can already manipulate variables without needing a memoryReference and sometimes it might work and other maybe not. Not sure how DA's do it now, As an end user, if I want to know the address of something, I already can if I am in the right scope. As a DA or its direct client, I am not sure we need it
StackFrame: I am thinking this is very useful but only for certain languages/environments and it is usually easy to get from most debugger backends .. verification needed. This can also help the disassembly view.
GotoTarget: I will defer on this one :-)

weinand commented 5 years ago

@haneefdm re "getting memory references": I was not questioning the need for having memory references in various responses. My point is: if a client like VS Code doesn't deal with memory references at all, there is no need for the DA to return them (and keep data structures for their management).

BTW, I'm not done with my feedback from above. I just had to head home and had to dump the thoughts somewhere...

haneefdm commented 5 years ago

@weinand no worries, I knew you were not done with your thoughts, you said 'more to come...'. Just wanted to give you and @andrewcrawley something to chew on. Take your time...

Your post got me thinking though and I said to myself, how do we justify the need and use cases for all those four memoryReferences regardless of VSCode or other clients and burden placed on the DAs. Since I couldn't, I posted my thoughts. They are all optional so... then there is that.

andrewcrawley commented 5 years ago

@weinand: I like the idea of having the host state in the initialize request that it wants memoryReference values. These are cheap to generate for vsdbg, but they may not be for other debuggers, so it would be good not to do it unnecessarily or add extra protocol overhead for hosts that don't care about it.

VS definitely needs the raw memory data, as our memory viewer allows the user to customize the formatting pretty extensively, and this is all done in the UI layer. @haneefdm already hit on this, but I suspect that most UIs will want to do some amount of formatting - for instance, most memory viewers or "hex dump" tools I've seen show hex on one side and the char equivalent on the other side. If a DA just provides this as pre-formatted text, the UI won't be able to line wrap it effectively if the window size changes (unless the DA limits itself to 16 bytes per line or some other tiny number that is unlikely to need wrapping, which I don't think is a very nice experience).

I had considered sending the data as hex-encoded bytes instead of base64, but I ended up using base64, since it's a more efficient encoding and it's easy to use from most languages (C++, C#, and JS, at least).

weinand commented 5 years ago

@andrewcrawley I've added more to the "first batch of feedback" above. Sorry for making you jump back an forth through the comments...

andrewcrawley commented 5 years ago

@weinand: Don't worry about it : )

Disassembly is actually surprisingly complicated, as I've been learning over the past few weeks while implementing this proposal in vsdbg. It has to be handled by the DA because only the actual debugger knows the CPU architecture of the debugged process (which may be different from the architecture of the host in remote debugging cases), has access to the necessary info from debug symbols, ability to map source lines to a memory location in jitted code, etc.

I don't think a dynamic document model would work for disassembly. For one thing, the usual model is to start disassembling at some arbitrary address (current IP, function entry point, arbitrary source line via gotoTargets request, etc), and then as the user scrolls around, the UI requests additional instructions as necessary. The current dynamic document model in the protocol assumes that documents are either static or only change by replacing the entire document, so it would require a lot of modification to work with "lazy" content. In addition, the VS disassembly window is also configurable in terms of formatting, similar to the memory view window. A basic UI would just have to concatenate together the address and instruction fields from DisassembledInstruction, which doesn't seem onerous.

Setting BPs in disassembly is not in scope at the moment, but we'll probably want to do it eventually. I think the best option here would be to add an AddressBreakpoint (or something similar) that you set via an address - this is how VS handles it. Supporting BPs via dynamic documents (assuming the protocol related to dynamic documents was extended to support virtualized scrolling) would require the DA to maintain an internal mapping between "dynamic document line number" -> DisassembledInstruction, since the line numbers would be changing as more content was added, which seems like a lot of complexity and overhead compared to just using on the address field already present in a DiassembledInstruction.

haneefdm commented 5 years ago

Yup, lazy reading/rendering for disassembly is how it needs to be done. I would do that even for memory windows. In fact, I would do that everywhere, if it were up to me. Super large stack. super large array for a variable. In the case of stack/variables, I had a Show Next 100 and/or Show All UI element with a hint of how much more and let the user decide. For memory/disassembly, I did it based on the scrolling. Those concepts were baked into my frontend and backend.

My past has no relevance here...

Even in the current DA MIEngine, don't we just request a limited # (like 1000) of frames of a stack?

weinand commented 5 years ago

@andrewcrawley thanks for helping me understand disassembly and addressing my concerns. Overall the proposal looks good to me. I will create a PR with the schema changes (including the client capability in the Initialize request).

gregg-miskelly commented 5 years ago

@weinand Andrew has a json schema version of all these proposals. So he can comment more when he gets to work, but it might be easier for him to submit a PR if you would like.

PavelSosin commented 5 years ago

Recently I learned that disassembly feature is heavily assembler implementation related. In the real world we have 2 assembler languages for Intel and yet another one for ARM. They are really different programming languages. I propose to keep reflection of this feature the in protocol at highest level. For LSP we separated universal LSP from other services used in editor, like coloring and templating, which have separate APIs due to heavy language dependencies. These services share only the triggering command. Maybe it is more convenient for DA implementor to use more generic protocol and for developer to have language specific disassembler UI.

haneefdm commented 5 years ago

We should not confuse assembly languages from disassembly. There are two main assemblers for ARM (ARM and GNU) and Intel (Microsoft MASM, GNU), but there are others. But that is input. Disassembly has two main flavors people are used to. AT&T and Intel. Some debuggers allow you to choose the flavor. Not sure if the DA should allow that or stick to just one flavor or just do a pass through.

I don't see disassembly output as a language, Mnemonics, macros, labels are all normalized, but it may still look like original assembler input.

Also helpful to read this link

PavelSosin commented 5 years ago

The memory and register access is not simpler case because Intel architecture uses 4 different memory models and ARM architecture provides yet another one, very different from Intel. The only common thing is that all low-level debugger APIs implement evaluate expression command which accept expression as a string.The duty of DA is to bridge between UI and low-level debugger. interface EvaluateArguments used in EvaluateExpression request already has scope and format attributes. I propose to extend EvaluateEArguments with single attribute qualifier defined in the following way: qualifier: string

// qualifier is how the expression is interpreted by server and // helps debugger server to interpret expression and translate it into low-level protocol call
// according context of specific assembly language and target architecture. Up to server to translate this request into low-level debugging protocol or API. If translation is not possible server responses with negative response Values: nill : If not specified or null the regular context language expression register: //used to read register value. it is assumed expression attribute value is a register name "memory": //used to read memory content. expression attribute value is symbolic memory reference which is translated by server into memory referenced used by low-level protocol according in target architecture and target assembler language,e.g. //Intel debugger can pass different forms of expressions: MASM expressions or C++ for corresponding assembler languages // Arm debugger uses memory reference expression according Memory model of ARM processors Grammar validity f memory expression is checked by regular on-line validation mechanism provided by Hosted editor UI control for corresponding language. Remark1: register and memory can refer to either physical registers and memory or simulator registers and memory Remark2 : Up to low-level API and architecture to prevent access to memory outside application virtual memory. Remark3 : If context is hover the qualifier is derived from source token type. Hosted by UI source editor control provides this information out of-the-box because displayed source is already colored.

weinand commented 5 years ago

@andrewcrawley @gregg-miskelly I've created PR https://github.com/microsoft/debug-adapter-protocol/pull/50 for the complete DAP change.

PavelSosin commented 5 years ago

I strongly oppose any proposal to project some low-level debuggers features to Core Debugging Protocol. Low-level debuggers are functionally incomplete and inconsistent. Windows 10 already killed last "local" scenario via introduction of virtualization feature. Everybody can enable hardware virtualization feature, Open Hyper-V manager, create few Virtual Machines, install few Ubuntu iso images and connect to these VM using built-in OpenSSH client or Chrome extension. I suppose nobody tested dbg in this scenario but I'm not sure that it can work. Also dbg semantics is not clear: memory of which VM ? maybe Host OS memory? Registers of which core? Unlike clear expressions context in all high-level debuggers - VM and Thread, Memory and register semantic of dbg is not clear for me. Evaluate expression has universal meaning. Registers and memory access operations are applicable only in extreme scenario: Low-level programming of single core machine without hardware virtualization. The dbg definition of register and memory operations lack context. Manufacturing of such architecture has been terminated. I also strongly oppose usage the laptop resource in 25% of its capability after 100% of price has been payed.

microsoft / debug-adapter-protocol