z00m128 / sjasmplus

Command-line cross-compiler of assembly language for Z80 CPU.
http://z00m128.github.io/sjasmplus/
BSD 3-Clause "New" or "Revised" License
382 stars 54 forks source link

Export of xmap (labels + comments + memory metainfo) for Xpeccy emulator #161

Open Volutar opened 2 years ago

Volutar commented 2 years ago

Hello! Xpeccy emulator supports labels and comments and memory metainfo (i.e. code/byte/word/ascii)

File format is not textual.

<”XMEMMAP ”> //header
<”ramflags”><4 bytes = length><memory_map>
<memory_map> flags x ram_size
    flags( & 0xf0): 0x00=default(code) 0x10=db 0x20=dw 0x30=dw (addr=label) 0x40=db (ascii text) 0x50=code
<”romflags”><4 bytes = length><memory_map>
    <memory_map> flags x rom_size
        [as above]
<”labels  ”><4 bytes = length><text>
    <text>
SSS:NNNNNNNN:label\n
...
<”comments”><4 bytes = length><text>
    <text>
NNNNNNNN:comment\n
...

It consists of number of chunks with 8-char header, 4 byte length of chunks and chunk data itself. For 48k Spectrum memory model ram section still should consist of 8 pages (but only pages 0,2,5 are used, others are empty). Rom section is unnecessary for export (though it is possible to make rom images?). 128K ram layout is linear (0,1,2,3,4,5,6,7).

SSS - memory block type "RAM"/"ROM". "\n" between labels/comments is in Unix format (0x0a), utf8 encoding. NNNNNNNN - is linear address

Emulator page https://github.com/samstyle/Xpeccy Latest Window build (supporting xmap) with configs/roms/profiles - https://volutar.myds.me/Xpeccy0.6.20211230.zip command line parameter to import xmap: "--xmap \<file>" Or can be loaded/saved via debugger menu: image

Labels can be added by direct entering in address column. ";" as first character makes it a comment. Memory type can be changed by selecting range and choosing type with RMB context menu (View->).

Currently comments are single-lined but later they meant to be multilined. Each address can have only one comment and only one label (comment+label per address).

ped7g commented 2 years ago

hello, I see this is official docs translated from Russian, which helps a lot.

But before reading more into it, I would prefer to have also some "hard facts", ie. some example of asm file in sjasmplus, and some working map filed saved from the emulator with expected values.

Also I see some thing incompatible with sjasmplus:

Each address can have only one comment and only one label (comment+label per address).

and this is quite often broken by regular asm project, producing several labels per one address, so it depends if such map file would crash the emulator, or the first/last label would win, etc...

Also there're several map file types already produced by sjasmplus, with SLD files being meant as most rich info for any debugger, including source-file locations for every instruction, so I'm not ecstatic about adding another one (with no obvious advantage).

Checking the official Xpeccy documentation just now, I see it should be able to load the LABELSLIST map file, which IIRC does include the zx48 pages translation to 5,2,0 mapping, so this should be already usable with current sjasmplus.

Volutar commented 2 years ago

the ZX48 pages in sjasmplus are by default 0,1,2,3, not rom,5,2,0, sjasmplus doesn't have rom page (unfortunately, I tried to add it, but it was getting too complicated). So only 128 target makes sense.

Bank numbers could be remapped during xmap export procedure if it's 48k.

not all types are tracked by sjasmplus this way (DB is not aware if argument was only text or not, and not sure what is the difference between 0x00 and 0x50)

Actually there's no difference between 0x00 and 0x50 besides 0x00 is simply default, and 0x50 is marked during "runtime mapping" as actually executed parts of code.

and this is quite often broken by regular asm project, producing several labels per one address, so it depends if such map file would crash the emulator, or the first/last label would win, etc...

If there are multiple labels per one address - it's fine for assembler - each label converted to same address. But there's no way to reverse this action - disassembler can't know which label is going to be used in this particular place or another. So yes, it simply going to use "last/first" one. Of course without crashing.

Also there're several map file types already produced by sjasmplus, with SLD files being meant as most rich info for any debugger, including source-file locations for every instruction, so I'm not ecstatic about adding another one (with no obvious advantage).

I didn't find any exportable format which has memory data "mapping", so code and data parts are treated differently. I found structures from cspect format useless - they are not about memory data treating. SLD is just about source code parallel references.

xmap format is meant to store such data, so it ease code reverse into source. Though it will be really helpful if assembler will export data types so debugging will be much more intuitive.

There was a question about 'A' or 0x61, as argument. It's not an issue, because inner bytes of code instructions are always "code" (it is not that advanced to keep and store such nuances as immediate value type in cpu instructions like dec/hex/char/signed). Data types are only about data sections, bytes/strings/words, DB/DW (it doesn't even decode floats). So textual data can only consist of ascii (32-127). Any other values are treated as hexadecimals anyway.

Words/address have only one difference: words are never converted into labels in debugger, and are "grouped" into multiple values, if there are more than 1 adjacent values. While address is meant to be converted into text label, and currently not grouped.

If life example is really necessary, it going to take some time. Though emulator itself can be used to create xmap file (as reverse engineering tool, with comments and data sections).

Volutar commented 2 years ago

xmaptest.zip This is real life example. In emulator debugger it looks like this: image

ped7g commented 2 years ago

I didn't find any exportable format which has memory data "mapping", so code and data parts are treated differently. I found structures from cspect format useless - they are not about memory data treating. SLD is just about source code parallel references.

I think you just don't understand all info available in SLD. Or maybe I'm missing what you mean by "data mapping". Also I'm not sure what you mean by code and data parts, as in machine code everything can be both code and data, in some tricky code and size-coding intros even intentionally working as both at the same time... :) So any such distinction is heuristic at best...

Anyway, thanks for the files, I will try to take a look on it and see what is going on and if I understand it (and whether I can generate enough info during assembling).

Volutar commented 2 years ago

Or maybe I'm missing what you mean by "data mapping".

I mean memory mapping. If we look into source .asm code, we clearly see where are assembler instructions, where is string, and where is data (as in example). Of course there are relocatable things, overlays, and such, but when compilation is happening, each generated byte does have its semantics, just by looking into if it's db/dw/ds or such, or assembler instructions.

ped7g commented 2 years ago

yes, that's all part of SLD data. (of course it doesn't cover cases when you produce instruction by DB opcode or when you use part of code also as data, but that trickery is not that common in normal code, so in 99% cases the source is simple enough)

Volutar commented 2 years ago

Maybe it's not in v1.0 of SLD, but currently it doesn't produce any information about exact data types (if it's DB or DW). The file from example has given that information about data:

sc.asm|41||0|5|30058|F|operd
sc.asm|41||0|5|30058|L|,operd,,+used
sc.asm|42||0|5|30058|T| <----- code
sc.asm|43||0|5|30060|T|
sc.asm|44||0|5|30063|T|
sc.asm|45||0|5|30066|T|
sc.asm|46||0|5|30069|T| <----- last code line
sc.asm|49||0|5|30072|F|text
sc.asm|49||0|5|30072|L|,text,,+used 
sc.asm|56||0|5|30118|F|table <------ absence of T means above is not the code. OK. But which type? Byte? Word? String?
sc.asm|56||0|5|30118|L|,table,,+used

Just labels, starting addreses, and source code lines, and that's all. Nothing about if it's string, bytes, or word values in "data". Everything that can be taken from it - if it's code, or not code. And only by analyzing absense of "T" afterwards. No direct information about if it's a data, and which address contains DB, and which DW.

Sure, it is possible to parse source code once more, to extract all necessary information like if it's DB or DW, but.. it's like making another assembler, partially (or maybe not partially, because it gonna need all structures parse, all aliases for byte/db/dm/ds and stuff, which would be one step behind making another assembler).

Volutar commented 2 years ago

Translated documentation for emulator (in English): https://docs.google.com/document/d/1vdFRntEj_dAUjEV_EsaZBP66kX4VIuxJPHR-gCQRnTc/

ped7g commented 2 years ago

you are right, SLD doesn't produce information on the "DB vs DW" level detail. I will have to take a deeper look into assembling code what is possible, but IIRC the DB 1 vs DB "abc" (number vs string) will be very difficult to achieve.

Volutar commented 2 years ago

DB 0x61,0x62,0x63 vs DB "abc" is almost impossible to determine after compilation (only by value statistic analysis, and it will be seldom correctly for short data fragments), but when source code is processed during compilation, it's clear if there's a "" or just dec/hex value. By absence or presence of ". Same with WORD vs ADDRESS (words are replaced with labels in dissassembler): label is always "address" while direct value is always word.

Volutar commented 2 years ago

Are there any chances of this to happen anyhow ... soon-ish?

If I understand correctly it will require simple data type flagging with some simple memory map array on 3rd pass.

ped7g commented 2 years ago

I didn't look into it in last couple of months, so no idea.. next release will go probably out in 3-5 weeks (I'm waiting for some feedback on the Lua upgrade, to see if there's something else to do, beyond the decimal parsing which was reported right after 1.20.0 release), I may try to take a look again, if there is some low hanging fruit in this ticket and I can improve it a bit.

But affecting labels definition backward from directive/instruction is not so easy with current architecture. Although I just did it first time with the smart SMC labels, so there's some way how to do it, although that doesn't seem like good fit for larger-scale tagging.

I don't know, I remember this being not simple, will need to take again some deep look, what is possible.