z00m128 / sjasmplus

Command-line cross-compiler of assembly language for Z80 CPU.
http://z00m128.github.io/sjasmplus/
BSD 3-Clause "New" or "Revised" License
382 stars 54 forks source link

Asking for relocation table support in order to enable developing for SymbOS. #99

Closed NYYR1KK1 closed 4 years ago

NYYR1KK1 commented 4 years ago

Hello,

I would like to use SjASMPlus to compile applications for SymbOS, but the problem is that the executable files on this OS need to include information about how to relocate the code in memory.

Currently only assembler that I know to support relocation tables is embedded in to WinAPE Amstrad emulator, but as development environment it has other features that make me want to change to separate assembler if possible.

Here is example of the output listing from WinAPE that should be pretty self explaining about how this feature works on it:

000003  0000  (1000)        org #1000
000004  1000                relocate_start
000006  1000  21 07 10          ld hl,test
000007  1003  11 08 10          ld de,test2
000008  1006  C9                ret
000010  1007  00            test    db 0
000011  1008  00            test2   db 0
000013  1009  02 00             dw relocate_count
000014  100B  04 00             dw relocate_size
000016  100D  01 10 04 10   relocate_table
000017  1011                relocate_end

The size/count can be calculated without custom support from assembler, but the table it self is bit of a problem to get around. Practically you need to compile the source block two times to two different addresses and then use custom program to compare the binary output and create difference table that you can then include back to the project... As you may imagine, it is not practical approach.

NYYR1KK1 commented 4 years ago

Ok, the codebox did not work quite how I expected, but please see this link instead: https://www.msx.org/forum/msx-talk/development/looking-for-working-assembler-to-compile-symbos-app

ped7g commented 4 years ago

hm... I'm still not sure what is the output. this looks incomplete? If this would be complete, how would the OS know where the relocation data starts? And what was the original (#1000) org? I guess the executable has some kind of header where #1000 and +9 (position of table in binary) values are stored, or something similar?

And how do you want the source to look? Could it be like relocate_end would be right after test2 and it would inject the two words + table into output?

And finally, how about expressions, what do you expect from ld h,high test or ld hl,test/2? I don't see any reasonable way how the later can be used in relocatable code, so I guess any expression including label(s) would maybe emit warnings under relocation mode? The first one is something I tend to use in my code to address memory, but seems it's not compatible with the relocation table structure, so warning about it is probably still the way to go.

Then there are things like ld bc,testend-teststart ; load size of data -> this would produce warning and no relocation data (of course) ... hmm... I guess the warning may be too much. And the relocation data would be added only when single-label-only was used as expression, and the label was defined as regular label (not equ/defl/= one).

I'm not sure how much difficult this would be, but sounds to me like there is chance, unless I will run into something ugly in labels implementation (it's not very nice).

But if you can provide full set: asm + lst + bin of some short example, it would help me to better understand what it does. Like example source:

org #1000
relocate_start
    ld hl,test
    ld de,(test2)
    ld bc,test2-test
    ret
    call test
    jp  test
    jr  test
test    db 0
test2   db 0
; ?? not sure how much of this has to be part of source ??
;   dw relocate_count
;   dw relocate_size
;relocate_table
; I would expect only this (to pair with relocate_start):
relocate_end

and by default sjasmplus expect directives ("relocate_..") after at least one whitespace, this would be no exception (there's option switch to allow directives at beginning of line, but I would rather recommend to adjust the source if possible).

NYYR1KK1 commented 4 years ago

hm... I'm still not sure what is the output. this looks incomplete?

Ok, yes I left out 3 first lines as irrelevant.

If this would be complete, how would the OS know where the relocation data starts? That is the "relocate_start"

And what was the original (#1000) org?

That is used as a offset in relocation table... AFAIK in WinAPE you can also give this as a parameter for relocate_start if you wish. The manual about this feature is here: http://www.winape.net/help/assembler_directives.html

I guess the executable has some kind of header where #1000 and +9 (position of table in binary) values are stored, or something similar?

Yes, this example was not SymbOS executable example... Those executables have lot of different blocks. First application header (that includes desktop icons, descriptions, version requirements and such) then Application code area, data area (for data that needs to be accessible by screen manager process) , transfer area (ie. stack, message buffer to talk with other processes/daemons/applications, storage for radiobutton, selected tab etc. status) and then as very last block the relocation table as the code area load address depends of what other applications you have running at a same time.

From here you can find lots of SymbOS application sources and binaries: http://symbos.de/apps.htm ... just check that the download includes comment "(source codes included)"

And how do you want the source to look? Could it be like relocate_end would be right after test2 and it would inject the two words + table into output?

by default sjasmplus expect directives ("relocate_..") after at least one whitespace, this would be no exception (there's option switch to allow directives at beginning of line, but I would rather recommend to adjust the source if possible).

Sure, I don't think the syntax that WinAPE uses is great. so what ever fits best in to SjAsmPlus style of directives is fine with me.

asm + lst + bin of some short example, it would help me to better understand what it does. Like example source:

Here is the full compiler output after removing comment from front of "relocate_table"... You can see lines 1 & 2 two times on the output, but it is not an copy/paste error.

WinAPE Z80 Assembler V1.0.13

000001  0000  (1000)        org #1000
000002  1000                relocate_start
000001  0000  (1000)        org #1000
000002  1000                relocate_start
000003  1000  21 13 10          ld hl,test
000004  1003  ED 5B 14 10       ld de,(test2)
000005  1007  01 01 00          ld bc,test2-test
000006  100A  C9                ret
000007  100B  CD 13 10          call test
000008  100E  C3 13 10          jp  test
000009  1011  18 00             jr  test
000010  1013  00            test    db 0
000011  1014  00            test2   db 0
000012  1015                ; ?? not sure how much of this has to be part of source ??
000013  1015                ;   dw relocate_count
000014  1015                ;   dw relocate_size
000015  1015  01 10 05 10   relocate_table
        1019  0C 10 0F 10 
000016  101D                ; I would expect only this (to pair with relocate_start)
000016  101D                
000017  101D                relocate_end

And finally, how about expressions, what do you expect from ld h,high test or ld hl,test/2? ...

As those two extra lines give a hint, I think internally in the compiler the process is just:

NYYR1KK1 commented 4 years ago

Ah, and if you want to see those SymbOS applications running, you can try them out in your browser: https://webmsx.org/symbos

NYYR1KK1 commented 4 years ago

After some testing, I think the process is a bit more complex as the relocation table can be located also before or middle of code... So probably something like:

... Yes it can be optimized, but just to give the basic idea.

ped7g commented 4 years ago

I still don't understand what is the ouput. The full listing doesn't show any machine code bytes emitted (000001 0000 (1000) - I understand it as line 1, address 0000, (1000) is sort of comment of ORG directive, but not output bytes.

And neither I see any mark in the binary output, where the relocation table is positioned (if I would process only the machine code bytes from that listing (21 13 10 ... 0F 10, I would have no idea where the "relocate_table" starts, the address $1015 (or offset in output $0015) is not part of the output. So I guess WinAPE does produce this plus some meta data extra, where the linker can figure out where the relocation data are in the binary (and their size).

About positioning table ahead of code, causing dynamic size ... it's OK in sjasmplus if there is ORG putting the code at explicit fixed address, but if the relocation table will cause the code to move, the 3-pass assembling will be seriously hampered by that, so in such case only table at end could be supported.

But overall I have the gut feeling that the relocation table data don't even belong to raw machine code output? Can you provide full output of WinAPE (all files created during build of that example)? I would expect it to store the code itself in different binary than the relocation table bytes, or at least create some meta-data file marking which parts of binary are what.

If you want to use sjasmplus for assembling, and WinAPE toolchain to build executables, I believe the sjasmplus should then produce all bin/meta data in similar way how their assembler does, so their linker can process them.

I'm really reluctant to read more about WinAPE or SymbOS or their executable format right now, too busy with other things, so if you can research what kind of output from sjasmplus would be usable and helpful, and describe that in detail, you will save me ton of time (and make lot more likely that this will be added to sjasmplus).

About the +1 offset double-assembling to detect bytes for relocation. Seems to me kinda too complicated, and it still doesn't address the ld h,high test case, which in worst case (test=$10FF for example -> $1100 in the +1, will trigger the detection because ld h,$10 vs ld h,$11, but relocation does affect only words, so the $10/$11 byte + one following will be hit) can produce wrong relocation data (the loader will then damage the first byte of following instruction). (this doesn't worry me too much, because clearly if you fully understand the relocation implementation and the assembler implementation, you can probably always come with some code construct breaking the whole thing intentionally, using relocation simply requires some cooperation from programmer using only sane techniques which are treated by relocation correctly - I don't see any chance to make this completely foolproof, some of the responsibility will be always at the coder side).

At this moment I would prefer to detect relocation case based on only-single-regular-label used as expression part of instruction, but I need some time to rethink this, if it does really match the purpose, or there exist some code construct which should be part of relocation, but doesn't use single-label expression (and also the other way, some case where single label is used, but shouldn't be part of relocation).

ped7g commented 4 years ago

hm... the cases like ld a,(var_table+4) will not get relocated, which is kinda valid approach/code (although not classy, one can use STRUCT in sjasmplus to assign each sub-variable full label and then have like ld a,(table.color5), avoiding any hardcoded "magic numbers" in source).

With your +1 offset way this case would be detected too. Ok, I see some benefit to it, but that makes the whole thing lot more complicated, so I'm not sure if I will go that way, if I add this to sjasmplus, expect the first version rather with single-label detection than with the double assembling, unless you already know by checking the existing sources, that this approach means lot of sources fixing for you to work. (if you have some sources, and you don't mind, you can share them with me by mail: ped at 7gods☢org - I can then take a look how the current syntax looks and what kind of code constructs lead to relocation - it's always better to haver real-world examples than hypothesising about it)

NYYR1KK1 commented 4 years ago

I still don't understand what is the ouput. The full listing doesn't show any machine code bytes emitted (000001 0000 (1000) - I understand it as line 1, address 0000, (1000) is sort of comment of ORG directive, but not output bytes.

Correct

And neither I see any mark in the binary output, where the relocation table is positioned (if I would process only the machine code bytes from that listing (21 13 10 ... 0F 10, I would have no idea where the "relocate_table" starts, the address $1015 (or offset in output $0015) is not part of the output. So I guess WinAPE does produce this plus some meta data extra, where the linker can figure out where the relocation data are in the binary (and their size).

Yes, the relocate table in this example is output to file offset $0015-$001D What I think you fail to see is that this is example that you created... and as such, it is not functional code, rather just random bytes on a file that demonstrate how the assembler works.

The table by it self does not have any functionality (or label assigned) unless you put some label and write some code that uses the table... or someone else does. The table is just data to be used by the developer him self...

The first column represents the line number in input text file that is compiled. Empty lines are skipped from the input file and that is why you may see the line numbers skipping some numbers. Second column is the address Z80 address space (that can be affected with ORG), 3rd column represents the bytes that are output to output file except this "(1000)" that is just located in quite weird place and finally 4th column represents the code that is read from the input file.

About positioning table ahead of code, causing dynamic size ... it's OK in sjasmplus if there is ORG putting the code at explicit fixed address, but if the relocation table will cause the code to move, the 3-pass assembling will be seriously hampered by that, so in such case only table at end could be supported.

Ok, in SymbOS use case it is not a problem as it expects the relocation table to be the last block anyway.

But overall I have the gut feeling that the relocation table data don't even belong to raw machine code output?

Yes, this is the approach that assemblers tend to take and exactly what makes it difficult to develop any application on assembler (regardless of OS or machine) that needs self relocating code. Usually people in these cases start using JR instead of JP, but limiting development to commands and methods that use only relative addressing becomes very hard... especially when size of the assembled application becomes big.

Can you provide full output of WinAPE (all files created during build of that example)? Sure... but I have take a look what kind of files WinAPE can generate... I must warn you that I'm not really any kind of expert of using this program. I know that by default the listing goes to list box and it does not output ANY file (by default it outputs to emulated machine memory)... you need to add WRITE <filename> to beginning of source to even make it output the binary file...

I would expect it to store the code itself in different binary than the relocation table bytes, or at least create some meta-data file marking which parts of binary are what.

I have to see if I manage to make it output some debug data for you.

If you want to use sjasmplus for assembling, and WinAPE toolchain to build executables, I believe the sjasmplus should then produce all bin/meta data in similar way how their assembler does, so their linker can process them.

No, I want to get rid of WinAPE completely, uninstall it from my computer and start using SjAsmPlus instead for new projects. I don't expect that it should compile source files created for WinAPE... at least not as is... I like SjAsm syntax much better... I'm not Amstrad user, so WinAPE does not offer me anything I would like to keep. I'm just stuck with it as it is only assembler that generates these tables without hassle that makes your hair fall.

I'm really reluctant to read more about WinAPE or SymbOS or their executable format right now, too busy with other things, so if you can research what kind of output from sjasmplus would be usable and helpful, and describe that in detail, you will save me ton of time (and make lot more likely that this will be added to sjasmplus).

I will do my best... I'll download few of those examples I linked, compile them, add the .LST files to the archives and send to you.

About the +1 offset double-assembling to detect bytes for relocation. Seems to me kinda too complicated, and it still doesn't address the ld h,high test case, which in worst case (test=$10FF for example -> $1100 in the +1, will trigger the detection because ld h,$10 vs ld h,$11, but relocation does affect only words, so the $10/$11 byte + one following will be hit) can produce wrong relocation data (the loader will then damage the first byte of following instruction). (this doesn't worry me too much, because clearly if you fully understand the relocation implementation and the assembler implementation, you can probably always come with some code construct breaking the whole thing intentionally, using relocation simply requires some cooperation from programmer using only sane techniques which are treated by relocation correctly - I don't see any chance to make this completely foolproof, some of the responsibility will be always at the coder side).

Yes, naturally I understand this very well.

At this moment I would prefer to detect relocation case based on only-single-regular-label used as expression part of instruction, but I need some time to rethink this, if it does really match the purpose, or there exist some code construct which should be part of relocation, but doesn't use single-label expression (and also the other way, some case where single label is used, but shouldn't be part of relocation).

Well single label from outside of the relocation block naturally should not be included in the relocation table, but I would not worry too much about some XYZ EQU $+#8000 that fall outside as user of the feature needs to be able to adjust the coding style to fit.

ped7g commented 4 years ago

right now file containing this emerged on my local disk (as a result of running sjasmplus):

# file opened: Issue_99_example1.asm
 1    0000                  org #1000
 2    1000                  relocate_start
 3    1000 21 07 10             ld hl,test
 4    1003 11 08 10             ld de,test2
 5    1006 C9                   ret
 6    1007 00           test    db 0
 7    1008 00           test2   db 0
 8    1009 02 00                dw relocate_count
 9    100B 04 00                dw relocate_size
10    100D 01 10 04 10      relocate_table
11    1011                  relocate_end
12    1011
# file closed: Issue_99_example1.asm

Value    Label
------ - -----------------------------------------------------------
0x0002   relocate_count
0x0004   relocate_size
0x1007   test
0x1008   test2

There's still lot of work to finish this, but your initial example works.

The relocate_table directive will NOT support byte-mode because the byte form feels redundant to me (usable only for very small projects), and I have ton of work to do to support at least word form correctly.

And the relocate_table directive will NOT support the optional base_address argument like WinApe, because I'm not precisely sure how to calculate the result (to shift it in which direction and which bytes to modify), plus again feels redundant to me, the ORG ahead of the relocation block can be used to select any base_address? Or maybe I'm misunderstanding the feature. (if you are bored, you can try WinAPE with some simple source and examine the binary output, how does the base_address argument work, but from the quick look at your SymbOS apps examples, it feels to me as not needed, those apps don't use it either).

@NYYR1KK1 question: the SymbOS apps you did send me in email... what is the license of those sources? I would like to modify them to assemble with sjasmplus, and add them to tests/integration folder, so the real-world use case is part of the test suite (this helps a lot to catch any regression in future versions early).

But I prefer to have the tests under MIT or at least BSD license (there are few under their own custom license, but it causes further headache in case I would ever want to get z00m's sjasmplus to Debian repo for example, probably forcing me to remove them later to keep only free-enough stuff).

If you know authors of those apps, can you ask for permission to modify the syntax to work with sjasmplus and release them under MIT license in sjasmplus/tests/? Or if they can provide some other reasonably small but not trivial "real world" app for this. (I don't need trivial "hello world"-like thing, I'm pretty sure my native tests will end up lot more comprehensive than that, but having real app like the two you sent me, that's helpful).

NYYR1KK1 commented 4 years ago

On Tue, 14 Jul 2020 at 00:11, Peter Ped Helcmanovsky < notifications@github.com> wrote:

right now file containing this emerged on my local disk (as a result of running sjasmplus):

Great news! I'm currently actually making a small demo for SymbOS to attend next weekend demo party. Although I'm starting to get along with WinApe, it is not making me happy.

The relocate_table directive will NOT support byte-mode because the byte form feels redundant to me (usable only for very small projects), and I have ton of work to do to support at least word form correctly.

Fair enough... byte mode sounds pretty useless... It may be ok, if the table values would be relative to previous value on the table, but still it would be mostly just a possible source of trouble. Such kind of space saving is rarely needed.

And the relocate_table directive will NOT support the optional base_address

argument like WinApe, because I'm not precisely sure how to calculate the result (to shift it in which direction and which bytes to modify), plus again feels redundant to me, the ORG ahead of the relocation block can be used to select any base_address? Or maybe I'm misunderstanding the feature. (if you are bored, you can try WinAPE with some simple source and examine the binary output, how does the base_address argument work, but from the quick look at your SymbOS apps examples, it feels to me as not needed, those apps don't use it either).

Yes I tried this and it seems the base_address is just subtracted from all the values inside the table... Pretty useless indeed, but should not be hard to implement either. Output from WinApe:

000004 0000 (3000) org #3000 000006 3000 relocate_start 000002 0000 write "bin\test" 000004 0000 (3000) org #3000 000006 3000 relocate_start 000008 3000 21 07 30 LD HL,test 000009 3003 11 08 30 ld de,test2 000010 3006 C9 ret 000011 3007 00 test db 0 000012 3008 00 test2 db 0 000013 3009 02 00 dw relocate_count 000014 300B 04 00 dw relocate_size 000015 300D 01 20 04 20 relocate_table #1000 000016 3011 relocate_end

@NYYR1KK1 https://github.com/NYYR1KK1 question: the SymbOS apps you did send me in email... what is the license of those sources?

In SourceForge all SymbOS app sources are listed as Public Domain. If you have some doubts, you can contact the author: jmika (at) symbos (dot) de

~NYYRIKKI

ped7g commented 4 years ago

ok, I think this is now partially-complete-enough to experiment with

If you can build from the branch and review my tests/relocate/ stuff, provide some feedback, or even help to finish some of the remaining sub-task, it will help me a lot. :)

ped7g commented 4 years ago

and it's almost done...

The final missing step is to try to convert real SymbOS app sources to sjasmplus syntax and diff the resulting executable to make sure they are identical with WinAPE results... (that may still uncover some new bug)

But the ZX48 custom relocator + example does work.

ped7g commented 4 years ago

and it's not almost done... :D After discussions with Busy about the ZX48 example we figured out the DISP needs better support, as there are real world use cases when it may be useful in relocated code. So back to implementing and writing more tests... (before trying out to convert SymbOS app)

ped7g commented 4 years ago

@NYYR1KK1 the SymbOS "notepad" has been converted to sjasmplus syntax and added as integration test

you can check the files https://github.com/z00m128/sjasmplus/tree/SymbOS_relocate/tests/integration/SymbOS-notepad to see how much they differ.

Ignore the extensions/naming changes, those are done to make the test compatible with my test-runner script, but in real non-test project the file names can stay as they are in original.

Focus on the content changes, those are required for sjasmplus usage to build the SymbOS projects. It's mostly about moving instructions/directives from first column and labels starting always at first column, and renaming few directives (READ -> INCLUDE, WRITE -> OUTPUT) and that's all. In other ways the WinAPE seems to have compatible syntax with sjasmplus, and the same binary is produced (with my new relocation feature implemented).


I'm still not finished with this, from the read through the notepad sources it seems there's lot of manual work done in asm source to build the executable. So I will add one more variant of "notepad" sources, into examples folder, this time using sjasmplus extra features like STRUCT/etc... to make the app source more like how I would like it (so you and SymbOS contributors can check and decide if you like such extra changes or not, maybe you will like them too).

But this will take me few more netto-hours, so I'm not sure when this will be available, maybe tomorrow, maybe next weekend.

NYYR1KK1 commented 4 years ago

Just a kind request, can you also update the Windows binary? My attempts to compile C on Windows has always ended up in some catastrophic situation where Cygwin downloads gigabytes of crap and yet still I end up with pile of errors I can't decipher... I really don't want to install that monster on my machine again.

If you want to test how your compiled Notepad binary works, it is very easy:

The Notepad.exe will appear to root directory of disk A and you can run it by double clicking it.

~NYYRIKKI

On Sun, 26 Jul 2020 at 20:40, Peter Ped Helcmanovsky < notifications@github.com> wrote:

Closed #99 https://github.com/z00m128/sjasmplus/issues/99 via #116 https://github.com/z00m128/sjasmplus/pull/116.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/z00m128/sjasmplus/issues/99#event-3587269174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGLQWVL2VIZQNETZIA6OG5LR5RTCDANCNFSM4LQ7OZKA .

ped7g commented 4 years ago

@NYYR1KK1 The windows binary comes for each "release", so just wait couple of hours/days, I decided to spin the current master branch as "v1.16.0" and the preparations for release are already underway. (I can't provide temporary test builds, as I have no windows machine around and I'm also too lazy to cross compile)

Actually I'm sort of curious why windows people just don't download virtual-box or something similar and don't run linux in VM as their work OS, if they can't reinstall the whole PC. It would be IMO far simpler (although it would initially download couple of GB of stuff too). I can see how the half-patched solutions like cygwin end up being PITA, but you have to consider they are trying to provide working environment on windows OS, it's lot of extra work to support the limited OS, no wonder it doesn't feel always perfect.

(or you can read through the .cirrus.yml file and the used batch files used for VS build on windows machine, if you are already using VS, that one needs CMake to generate project solution files and then VS with MSCC to build sjasmplus.exe)


About testing binaries... still too much work, I just compare them with the original ones, as long as they are identical in every byte, I know it works ok. :) It will become more interesting with people trying to use the feature without my assistance, as non-sjasmplus user may be confused by some early problems, but both "notepad" and "nslookup" were surprisingly easy to port, seems like the way how SymbOS authors write their assembly sources is like 90% compatible with sjasmplus without any change and big part of required changes is the whitespace management as sjasmplus is strict about labels starting at first column of line.

NYYR1KK1 commented 4 years ago

On Mon, 27 Jul 2020 at 10:05, Peter Ped Helcmanovsky < notifications@github.com> wrote:

@NYYR1KK1 https://github.com/NYYR1KK1 The windows binary comes for each "release", so just wait couple of hours/days, I decided to spin the current master branch as "v1.16.0" and the preparations for release are already underway. (I can't provide temporary test builds, as I have no windows machine around and I'm also too lazy to cross compile)

No worries, it seems that Ped7g saved my day and provided the binaries. I will start developing an app and will report if I found something fishy (although I doubt that)

About testing binaries... still too much work, I just compare them with the original ones, as long as they are identical in every byte, I know it works ok. :)

Can't be... It is about 10 mouse clicks and you don't even need to download anything. :) But your point is valid. If the bytes are correct then the application is as well.

seems like the way how SymbOS authors write their assembly sources is like 90% compatible with sjasmplus without any change and big part of required changes is the whitespace management as sjasmplus is strict about labels starting at first column of line.

This sounds really good as naturally it is easier if you can use existing applications and templates as a starting point for new projects. Thank you for your efforts!

~NYYRIKKI

ped7g commented 4 years ago

No worries, it seems that Ped7g saved my day and provided the binaries. I will start developing an app and will report if I found something fishy (although I doubt that)

Not me. Windows builds are done by z00m128, as he has some windows machine where he can build the exe file. :) (and also few mac machines, so he is doing the final testing before releasing).

It has that extra advantage that at least two contributors are involved in every release, the extra pair of eyes really helps to catch some problems early, when one gets too tired or confused.

NYYR1KK1 commented 4 years ago

I practically just finished writing my own little compiler program for SymbOS. I made this compiler from start to end with this new SjAsmPlus and had no troubles what so ever. Thanks again for your efforts!

ped7g commented 4 years ago

@NYYR1KK1 then maybe you may be interested also in #93 and my current progress on it, see if you can spot something familiar in this test: https://github.com/z00m128/sjasmplus/blob/master/tests/struct/multi_line_initializer.asm