openMSX / openMSX

the MSX emulator that aims for perfection
http://openmsx.org
424 stars 94 forks source link

display context menu in symbol manager #1632

Open pvmm opened 1 month ago

pvmm commented 1 month ago
pvmm commented 3 weeks ago

I've given this some thought, and currently I only see these meaningful uses:

* The "address" column in the "disassembly" window. We can only replace the numeric address with a symbol if it matches the currently visible slot. But in particular I do NOT see how we could do the same in the "mnemonic" column. There are just too many cases where you can't decide whether or not to replace the symbol.

* When setting a breakpoint from within the symbol window (that functionality doesn't exist at the moment) we could immediately fill in the correct slot info.

Things that could be easier to do, but I don't think have ever been proposed:

Edit:

pvmm commented 3 weeks ago

I am concerned about how that slot information will be used. For example in the old debugger it was possible to annotate symbols with slots (and I think even with a "kind" of symbol (data or code)). But that info wasn't used at all! So we had a lot of code for no purpose. In the past I had already given this some thought, and I didn't see how that approach could be made to work well (and probably that's the main reason why nobody finished that code?).

We can use dynamic code analysis scripts in TCL to distinguish code from data, use special annotation in the symbol manager to display, for instance, "db <data>" instead of code in the disassembler widget. For instance:

db "Hello world"

instead of

ld c,b
ld h,l
ld l,h
ld l,h
ld l,a
jr nz,#c07e
ld l,a
ld (hl),d
ld l,h
ld h,h
m9710797 commented 3 weeks ago

Such a tool would be great. BUT the current disassembly view is not the right place for it:

I don't mind adding a separate viewer(*1) that works like you describe. But I think it will be much much complex than the current disassembly view. (And I think you underestimate it to be honest).

(*1) Or maybe it's better to have this as a separate off-line tool. Then it's OK to let the tool run for a few minutes to do a whole program analysis

pvmm commented 3 weeks ago

(*1) Or maybe it's better to have this as a separate off-line tool. Then it's OK to let the tool run for a few minutes to do a whole program analysis

That's the approach I was thinking: a script tool that goes in share/scripts and the developer can use and modify to their liking. But even if it is less useful for code in RAM that can modify itself, it should still prove useful for games and programs that run exclusively on ROM which are more common anyway. Even if it is not very precise, it could be used to kick-start an annotated version of the code (that can be saved as an omds file). But for this to work, the debugger needs some kind of memory annotation support. I would use the symbol manager for this.

m9710797 commented 3 weeks ago

OK. That wasn't clear (to me) from your original description.

But then I'd propose to go a step further.

In fact that's how many "traditional" debuggers work (e.g. gdb). They can operate in two modes:

The latter only works when there is extra debug information. This extra information "somehow"(1) contains a mapping from every PC-value to a source file and line-number (in some cases also column number). And with accurate debug info, you can skip over comments, and can map a range of PC-values to the same unexpanded macro. (1) Because this can be A LOT of data, there are various different ways to encode this information is a more compact format.

Most compilers can produce such debug information. I don't know if there are any MSX assemblers that can produce this info. Maybe we can get inspiration from some MSX C compilers, most likely they can already produce some kind of debug info.

The script you're talking about could then be the combination of:

It would be great if afterwards the user could still edit the source file (e.g. add comments, add or rename labels, or fix code/data mistakes). While preserving/updating the debug information.

In other words: these script(s) become a full reverse-engineering tool.

But I believe this should be a separate tool from openMSX. I also believe the source-code view and the disassembly view should be two completely separate viewers in the openMSX debugger. Just like in most (all?) other debuggers.

pvmm commented 2 weeks ago

Based on what Artrag reported about SDCC segments in NOI files, I fixed this PR to display only the segment a symbol belongs to: image

What if I created a first column on the left with a checkbox that allows the user to activate a breakpoint in that address/segment combination?

m9710797 commented 2 weeks ago

What if I created a first column on the left with a checkbox that allows the user to activate a breakpoint in that address/segment combination?

I'm not sure I like an extra column(*1) for that functionality. Maybe right-click context menu is better suited? We actually already had a feature request for this (though it doesn't mention the desired mechanism).

This patch adds a (16-bit) segment number. But there's no indication of whether this number is meaningful or not. E.g. if I load symbols for the MSX-BIOS, there are no segment numbers. And my preference would then be to not show a dummy segment number 0. And when setting a breakpoint on such a symbol, don't add a check for this meaningless segment.

Also this patch removed the slot information again. Isn't that also required to be able to set the breakpoint? We already discussed that the existing symbol-file formats don't carry slot-information, but this could e.g. be specified when loading the file. (But as I mentioned before, I would only allow zero or one specific slot, not a combination of multiple slots).

(*1) This is not my personal preference, but if this were to be implemented via an extra column, then I'd draw it with a small red circle, like in the disassembly view. But also if this was an extra column, then I'd expect it to always correctly show whether there's a breakpoint set on this symbol. But the question "Is there a breakpoint on this symbol?" does not have an easy answer. For example: what if I set a breakpoint on the numerical address 0x005f. Does that then automatically result in a breakpoint on the symbol CHMOD? And what if there are multiple symbols with the same numeric value? Does setting a breakpoint on one such symbol automatically set a breakpoint on all those symbols? If instead we only offer the option "Set a breakpoint on this symbol" in the context menu, then we don't necessarily need to indicate whether there already is such a breakpoint (thus avoiding the difficult question).

m9710797 commented 2 weeks ago

Thanks again for working on this. But I'd still like to ask you to explain:

I have the impression that you're starting with this last step, without having a clear idea yet what it is you're trying to achieve. (My reason for this impression is: you first add detailed slot stuff, including edit functionality, but then replace that with segment info, and no indication at all of how this info will be used).

pvmm commented 2 weeks ago

I'm not sure I like an extra column(*1) for that functionality. Maybe right-click context menu is better suited? We actually already had a feature request for this (though it doesn't mention the desired mechanism).

This patch adds a (16-bit) segment number. But there's no indication of whether this number is meaningful or not. E.g. if I load symbols for the MSX-BIOS, there are no segment numbers. And my preference would then be to not show a dummy segment number 0. And when setting a breakpoint on such a symbol, don't add a check for this meaningless segment.

I though about printing every name that is not in segment 0 in italic and remove the segment column. I think that is a good way to tell the user that the symbol is different without causing visual clutter. But the user could also configure this to be visible in the symbol table and it should be turned off by default.

Also this patch removed the slot information again. Isn't that also required to be able to set the breakpoint? We already discussed that the existing symbol-file formats don't carry slot-information, but this could e.g. be specified when loading the file. (But as I mentioned before, I would only allow zero or one specific slot, not a combination of multiple slots).

You wanted the segment column gone, but not the slot and subslot which are part of the same information? Yes, I removed it because it should be specified when loading the file. Or maybe it could stay in a hidden column that is invisible by default in the symbol table.

(*1) This is not my personal preference, but if this were to be implemented via an extra column, then I'd draw it with a small red circle, like in the disassembly view. But also if this was an extra column, then I'd expect it to always correctly show whether there's a breakpoint set on this symbol. But the question "Is there a breakpoint on this symbol?" does not have an easy answer. For example: what if I set a breakpoint on the numerical address 0x005f. Does that then automatically result in a breakpoint on the symbol CHMOD? And what if there are multiple symbols with the same numeric value? Does setting a breakpoint on one such symbol automatically set a breakpoint on all those symbols? If instead we only offer the option "Set a breakpoint on this symbol" in the context menu, then we don't necessarily need to indicate whether there already is such a breakpoint (thus avoiding the difficult question).

As long as the symbols are useful for debugging and inspecting code in different slots, subslots and segments I have no objection.

pvmm commented 2 weeks ago
  • First. What you're trying to achieve. Not in terms of the implementation, but from a user point of view. What functionality is missing today? My guess is that you want to make it easier to set breakpoints on precise memory locations (thus including slot/segment) via symbols? Correct? Is this the only goal or are there more?

Correct, but there is more. As a game developer writing programs for the MSX, there is no easy way to inspect the contents of a ROM cartridge, specially if it has banked ROM. The "Add hex editor" is confusing because it has too many options. If a inexperienced user inserts a "file.rom" they will not understand if they should select "main RAM", "file.rom", "file.rom romblocks" or even "slotted memory".

An easy way to address this is to create short cuts to the hex editor directly from the symbol table context menu. Even an experienced user would prefer to open a hex editor from a menu item than to navigate through layers of menus and move through pages and pages of the hex editor just to find a symbol in the symbol table. I would like to see this done for disasm widget too, but that can wait a little.

* Second. _How_ will this work from the _user point of view_. It's OK if this is still vague at this point. E.g. it could be stuff like "an extra column in the symbol manager with slot and segment information. And how the _user_ provide this extra information (is it somehow obtained from symbol file, does to user manually edit it, ...). This extra info is used to "..." (e.g. to set an extra condition when you create a breakpoint via the symbol manager).

The tools the developer uses to create programs (specially tools in active development) can provide most of the information about segments. They may not know the slot and subslot because this is hardware specific, but this dialog could be extended to also allow the developer to insert the corresponding symbol file and OpenMSX would already know what slot should be used:

image

* And only as the last step, explain your ideas for how this could actually be implemented in the code.

I have the impression that you're starting with this last step, without having a clear idea yet what it is you're trying to achieve. (My reason for this impression is: you first add detailed slot stuff, including edit functionality, but then replace that with segment info, and no indication at all of how this info will be used).

My first approach is to replicate what was in the debugger. But as you said, if it is not necessary to display slot, subslot and segment for each and every symbol in the symbol table, maybe it should be associated with the file from which the symbol was loaded, so that's what I had in mind.

m9710797 commented 2 weeks ago

... You wanted the segment column gone, but ...

To be clear. I don't want anything. I was mainly trying to guess what you were trying to achieve, and then based on my limited (and maybe partly wrong) understanding, suggest improvements.

So I'm happy that in your next post you're clarifying your goals.

pvmm commented 1 week ago

OpenMSX now search for a debuggable in memory with the same name (without the extension) of the symbol file and open that ROM file in the hex editor if it was found. image

pvmm commented 1 week ago

<wouterv> i read a bit of the code .. seems like the "open in hex editor" function is meant for symbols from roms. I think we should also make it work for other symbols. e.g. symbols for the standard MSX BIOS calls or system variables

It already works for both. If you load a symbol file called MSX BIOS with BASIC ROM.noi with the BIOS calls at the correct addresses, the context menu will open the hex editor at that location. The same goes for system variables if you put them in a file called memory.noi

m9710797 commented 1 week ago

It already works for both. If you load a symbol file called MSX BIOS with BASIC ROM.noi with the BIOS calls at the correct addresses, the context menu will open the hex editor at that location. The same goes for system variables if you put them in a file called memory.noi

Hmm, but that's very limiting. I doubt many people today already have symbol files with such names. And now that I think of it: typically in your program you'll have various lines like CHMOD equ #005f, so BIOS-symbols and game-symbols will often be mixed in the same symbol file.

If I read the code correctly, then I think you're using the symbol-value for the file-offset in the ROM file. That's wrong. E.g. a typical ROM mapper get mapped in the MSX address space starting from address 0x4000. So for example the entry-location for the game might be at address 0x4010, but that's at file-offset 0x10.

I don't see a solution to this problem. Not without adding extra information in the symbol file: we'd separately need the file-offset and the location in the MSX address space. And I don't see how one can be calculated from the other. Not even when you add the extra knowledge about the ROM start-address and mapper-segment-size.

Maybe you have a solution for this? But I don't. The only solution I see is to extend the symbol file to contain both values. But that's not a practical solution. To be usable it should be possible for the user to recompile his project and immediately start using the new ROM + symbol-file in openMSX (without any manual tuning or edits).

So personally I don't believe the current PR is the right approach. I also doubt you'd often want to go from a symbol to the corresponding location in the hex-editor. E.g. if you're developing a ROM and you have the source code, that source code is a much more valuable resource to lookup such stuff. And the few times that you really want to see it in a hex editor, it's not too much effort to manually calculate the file offset.

I think a much more frequent scenario is that you want to go from a symbol to the corresponding location in the disassembly view. And I think that's easily implemented by taking the (16 bits) symbol-value. (And I understand that to see the correct data, also the correct slot and segment needs to be selected).

Another scenario might be that you want to place a breakpoint on the symbol value. As a first approximation you could also just use the (16-bit) symbol value(*1). But I understand you might want to refine this with the correct slot-information (that info is currently missing in the PR). And the correct segment number.

At some point in this PR you assume that the upper-bits of the symbol-value are the segment number, but at other points you assume that the full symbol-value is the file-offset. Both cannot be true at the same time. Many symbol file formats only support 16-bit values. Some allow larger values, but I don't know what the meaning of those higher bits is. Can you point me to some documentation? In any case I think we should also support well the (very common?) case that segment information is missing (so don't treat it the same as-if it were segment=0).

--

(*1) Maybe we should simplify this PR (a lot), at least for an initial version. Maybe a simple first step could be: In the symbol manager, in the symbol table, a right-click context menu with two options:

pvmm commented 1 week ago

At some point in this PR you assume that the upper-bits of the symbol-value are the segment number, but at other points you assume that the full symbol-value is the file-offset.

No, I don't. As you can see in SymbolManager.hh, Symbol value is still uint16_t. So the symbol is still limited to 64kB address even if I read a 24-bit number. So I never use it as a file offset:

struct Symbol
{
        Symbol(std::string n, uint16_t v, uint16_t s)
                : name(std::move(n)), value(v), segment(s) {} // clang-15 workaround

        std::string name;
        uint16_t value;
        uint16_t segment;

        auto operator<=>(const Symbol&) const = default;
};

I just added segment, that is also uint16_t. There is some segment-to-offset to calculate, but I haven't written it yet because I am still figuring out the UI details.

Both cannot be true at the same time. Many symbol file formats only support 16-bit values. Some allow larger values, but I don't know what the meaning of those higher bits is.

I cannot answer for all the symbol file formats, but in the case of NoICE files (my priority), this is the meaning of symbol-value that is bigger than 16-bit values (page 67 of SDCC manual):

When linking your objects you need to tell the linker where to put your segments. To do this you use the following command line option to SDCC: -Wl-b BANK1=0x18000 (See 3.3.5). This sets the virtual start address of this segment. It sets the banknumber to 0x01 and maps the bank to 0x8000 and up. The linker will not check for overflows, again this is your responsibility.

In this case, segment (0x01) and bank size allow us to point the symbol to the right file offset (file offset = SEGMENT# * BANK_SIZE + ADDRESS % BANK_SIZE), while the address part (0x8000) is also used to decode the assembly properly with jumps and calls pointing to the right addresses.

In any case I think we should also support well the (very common?) case that segment information is missing (so don't treat it the same as-if it were segment=0).

In the case of NOI files, missing segment information will not allow the compiler (and hex2bin) to produce a valid ROM file. So if a NOI file that creates a banked ROM file has segments somewhere, symbols that lack segment information mean segment 0. The omds file format can be extended to add the missing information (base address and bank size of a collection of symbols).

pvmm commented 5 days ago

Maybe you have a solution for this? But I don't. The only solution I see is to extend the symbol file to contain both values. But that's not a practical solution. To be usable it should be possible for the user to recompile his project and immediately start using the new ROM + symbol-file in openMSX (without any manual tuning or edits).

This is possible if we save the reference to the .NOI file, detect and reload it when it changes. The previous information about slot#-subslot# and base address is reused because it is saved as session data outside the symbol file: image

pvmm commented 5 days ago

I also doubt you'd often want to go from a symbol to the corresponding location in the hex-editor. E.g. if you're developing a ROM and you have the source code, that source code is a much more valuable resource to lookup such stuff. And the few times that you really want to see it in a hex editor, it's not too much effort to manually calculate the file offset.

I disagree. Otherwise using the hex editor for a different slot/subslot has no reason for existing too, but it's already there in the "Add hex editor" submenu. And it's a short cut: there is no fast way of accessing different slot/subslot/address without selecting a ROM from a list of ROM files and scrolling through pages and pages of memory dump.