Open andrewbird opened 3 years ago
I had some success with the linker script and was able to reduce the resident size, however in order to place the temporary code at the end of the output I needed to move my table data out from the heap into a new data segment. This meant that my on disk size grew significantly. The only way I can see around this is copy the transient code higher at install time, jump to it and then overwrite the original location with my data tables. Here's my latest attempt https://github.com/andrewbird/share/commit/c4271c2e2a90da25f748a94e629e862d858798f3
One thing I do notice though, there are lots of these that end up in the resident code __libi86_intr_call_0133
. They are only 3 bytes each, but when there are 377 of them they become quite significant.
Hello @andrewbird,
Sorry for the late reply! I will see in the meantime if I can add better support for TSRs to the gcc-ia16
toolchain and libi86
...
One thing I do notice though, there are lots of these that end up in the resident code
__libi86_intr_call_0133
.
Actually, under gcc-ia16
+ libi86
, you can try compiling your code with optimization (e.g. -Os
or -O2
) — this should help optimize away most of these __libi86_intr_call
... routines. These little routines are used to implement the calls to the 256 possible interrupts, for use with int86
etc. E.g.
__libi86_intr_call_0041:
int $0x21
ret
__libi86_intr_call_0042:
int $0x22
ret
If you enable optimization, and your interrupt numbers are compile-time constants, then <i86.h>
will use some magic incantations to link in only the needed int
invocations.
Thank you!
Hello @andrewbird,
From what I see in the memory map file, gcc-ia16 lays out the startup code, init and fini before the text of the program and consequently would still be included in the resident memory at TSR.
In case this is useful: the .init
and .fini
sections are customarily used to insert snippets (not entire routines) of code to call out to initialization routines before main
starts, and to termination routines after main
exits. In the default arrangement in gcc-ia16
though, the .init
section simply contains a call to a routine that goes through functions pointers in .ctors.*
sections. (The gccint
info page has a bit of discussion about this stuff.)
To mark an entire routine as a "startup" routine, the customary way — or at least the GCC-approved way — is to place it in a .text.startup
or .text.startup.*
section. In fact, by default GCC will automatically place main
and any __attribute__((constructor))
routines inside .text.startup
.
However, using a .non_resident_text
section name should work well too.
Thank you!
@tkchia yes that -Os
really helped! At load of share.com the symbols looked like this
dosdebug> usermap load-gnu ../fdos/share.git/share.map 02c3
dosdebug> usermap list
02c3:0120 _dos_freemem
02c3:0132 _dos_getvect
02c3:0144 _dos_setvect
02c3:0158 __libi86_ret_really_set_errno
02c3:0163 __libi86_int86_do
02c3:01be __libi86_intr_call_0041
02c3:01c1 __libi86_intr_call_0057
02c3:0235 _atoi_r
02c3:0262 _atol_r
02c3:027a __errno
02c3:02a3 memset
02c3:02c5 strchr
02c3:02e0 strlen
02c3:02fc _strtol_r
02c3:052f strtol
02c3:0547 write
02c3:055f _write_r
02c3:058b __call_exitprocs
02c3:06e3 _reclaim_reent
02c3:07ec _free_r
02c3:08b5 _malloc_r
02c3:09fd _sbrk_r
02c3:0a23 __udivsi3
02c3:0a34 __umodsi3
02c3:0a4a __ia16_ldivmodu
02c3:0a99 _sbrk
02c3:0ae5 _write
02c3:0bae __ia16_abort_impl
02c3:0c00 __DTOR_END__
02c3:0c02 _global_impure_ptr
02c3:0c04 _ctype_
02c3:1116 __dso_handle
02c3:111a _impure_ptr
02c3:114c __ctype_ptr__
02c3:1150 _heaplen
02c3:1166 _psp
02c3:116e _global_atexit
02c3:1170 __malloc_sbrk_start
02c3:1172 __malloc_free_list
02c3:11f6 table_pool
02c3:51f6 _start
02c3:5327 _exit
02c3:532c __msdos_crt_exit
02c3:536a init_tables
02c3:53c8 main
And the code size was
02b0:0000 0x0011 [SHARE - Environment]
02c2:0000 0x9d3c [SHARE] (END)
After running main
02c2:0000 0x01ae [SHARE]
Which compares very favourably with Turbo C at 0x0237. Obviously I still have a significant part of the interrupt handler commented out, have you any more thoughts about register access within the handler? Would replicating the Watcom method of access via union INTPACK r
be a viable method?
I had previously tried to mark _dos_freemem(), _dos_getvect() and _dos_setvect() with .non_resident_text section name as I know they are only called from installation code, but they still ended up in the .text section. Is there any way of marking their section from their caller's section (but it could get complicated if a function was called from both .text and .non_resident_text I guess)?
However, using a .non_resident_text section name should work well too.
Yes I'm making it up as I go along!
Thank you!
I pushed another version now that reduces both the on disk size and the resident memory to better than Turbo C. It does this by placing the non resident text over .bss and heap sections. At startup it copies that text up to just before 0xf000, then proceeds to the normal startup routines that initialize .bss and sets up the heap. Using this method I now only have one special section and malloc can now function as usual.
Type | Turbo C | Gcc ia16 |
---|---|---|
On disk size | 6300 | 5224 |
Resident memory size | 0x0240 | 0x01b5 |
Obviously the new startup code would be better off in assembly rather than fixed bytes and not being very sure about what registers I need to preserve I probably saved too much. https://github.com/andrewbird/share/commit/b21cd7dcea727f625adc3cd1402b55f5a097b6bf
Hello @andrewbird,
I had previously tried to mark _dos_freemem(), _dos_getvect() and _dos_setvect() with .non_resident_text section name as I know they are only called from installation code, but they still ended up in the .text section. Is there any way of marking their section from their caller's section (but it could get complicated if a function was called from both .text and .non_resident_text I guess)?
The short answer is, no unfortunately. Basically, the input sections where _dos_getvect (.)
etc. come from are taken from the modules in the library i.e. libi86.a
— and the output sections they end up in are determined by the linker script.
There is no easy way (for now) to automatically place _dos_getvect (.)
in a special transient section if it will only be called from transient code.
Thank you!
Yes that aligns with my various experiments. I guess for now it's best to recreate the simple ones locally with int86()
, but in any case I'm very happy with the resident and file sizes achieved, I just need to figure out a fairly clean way of accessing/modifying the registers from within the interrupt handler and to implement a chain interrupt function.
Thank you!
Interestingly switching to a local function for getvect()
based on int86x()
increased the resident size by 6 paragraphs! I see your implementation of _dos_getvect()
uses gcc inline assembler, so that would be the way to go if I wasn't trying to avoid sprinkling compiler specific assembly language all over the C source. So for now I'll stick with the i86 library code ending up in the resident portion, and it's good to know your library is lean!
Thank you!
So I'm still looking to shave some bytes off the share tsr. I'm looking at the map file and I see this
.rodata 0x000000000000125d 0x281 /usr/ia16-elf/lib/libc.a(lib_a-ctype_.o)
0x000000000000125d _ctype_
What is it, and can I exclude it somehow?
Thank you!
I see it's used by strtol(), which is presumably included to provide atol() which I do use. So I guess the answer is that I'm using it. Oh well!
Thank you!
Hello @andrewbird,
What is it, and can I exclude it somehow?
_ctype_
is the array of flags that is used to implement <ctype.h>
's isalpha (.)
, isdigit (.)
, etc. macros, for the default locale ("C"
). I think many C runtime libraries employ a similar method to implement <ctype.h>
. Thank you!
Hello @andrewbird,
I see it's used by strtol(), which is presumably included to provide atol() which I do use. So I guess the answer is that I'm using it.
Well, there are other options if you're looking for tiny size: you could write your own atol()
and use direct comparisons for digits 0-9, rather than the isdigit()
macros which drag in the table defined in <ctype.h>
. With a small, efficient implementation of atol
like the one below included in your source, this should end up being quite a bit smaller that what you have, provided it meets with your program requirements. This would prevent the standard library version from being pulled in to your program.
long atol(char *s) /* simple, somewhat hacked version of atoi that returns long for smaller size*/
{
int n = 0; /* declare long if requires decoding of numbers > 64K*/
while (*s >= '0' && *s <= '9')
n = n*10 + *s++ - '0';
return (long)n;
}
Thank you!
Hello @ghaerr Thanks for the advice, I'd just written but not yet tested this
+/* Naive implementation of atol(), only decimal digits allowed, no signs */
+long int atol(const char *s) NON_RESIDENT;
+long int atol(const char *s) {
+ long int val;
+ const char *p;
+
+ for (val = 0, p = s; *p; p++) {
+ if (*p == ' ')
+ continue;
+ if (*p < '0' || *p > '9')
+ break;
+ val *= 10;
+ val += *p - '0';
+ }
+
+ return val;
+}
It saved 1.2k on the file size, plus allowed me to steer it into my new non resident text section which means I save around the same on the resident size.
Thank you again!
Hello @tkchia , @ghaerr , I just wanted to say thank you for all your help to me on this gnu linker voyage. My PR has now been merged into FDOS/share so it's certainly been a worthwhile effort, and I've learnt a little bit on the way. A most interesting experience for me.
Thank you!
Hi @tkchia, I'm just trying out your new TSR support that just landed in the PPA. I'm having a little difficulty getting it to link fully, see https://github.com/andrewbird/share/commit/062910471e291d1e130a779c29d1364007c45982
So here's the build attempt and output message
ia16-elf-gcc -Wall -fpack-struct -mcmodel=tiny -mtsr -c share.c -o share.obj -Os
nasm -f elf gcc_help.asm -o gcc_help.obj
ia16-elf-ld share.obj gcc_help.obj -o share.com -L/usr/ia16-elf/lib -li86 --script=dtr-mts.ld -Map=share.map
ia16-elf-ld: cannot find -l:crtbegin.o
ia16-elf-ld: cannot find -l:crtend.o
make: *** [Makefile:2: share.com] Error 1
So some questions:
dtr-mts.ld
seemed to be the one that produced the least errors, was that a good choice?crt*
objects?Thank you!
After a little look at my previous linker script is seems I need to specify the -L path to gcc
diff --git a/build.sh b/build.sh
index 04cd88c..eaad3d0 100755
--- a/build.sh
+++ b/build.sh
@@ -5,7 +5,7 @@ if [ x"${COMPILER}" = "xgcc" ] ; then
export COPT="-Wall -fpack-struct -mcmodel=tiny -mtsr -c share.c -o share.obj -Os"
export XOBJS="gcc_help.obj"
export LD="ia16-elf-ld"
- export LOPT="share.obj ${XOBJS} -o share.com -L/usr/ia16-elf/lib -li86 --script=dtr-mts.ld -Map=share.map"
+ export LOPT="share.obj ${XOBJS} -o share.com -L/usr/lib/x86_64-linux-gnu/gcc/ia16-elf/6.3.0 -L/usr/ia16-elf/lib -li86 --script=dtr-mtsl.ld -Map=share.map"
make
So as the link now completes, only one question remains, is it expected to need to specify the -L paths to lib86 and gcc?
A little comparison between the custom linker script & loading code and your new TSR support. Type | Custom | New TSR support |
---|---|---|
On disk size (bytes) | 5416 | 5424 |
Resident size (paragraphs) | 0x01c0 | 0x0205 |
I'm a little surprised by the resident size growth?
Using ia16-elf-gcc to do the link stage means I don't have to specify the paths or the script
ia16-elf-gcc -Wall -fpack-struct -mcmodel=tiny -mtsr -c share.c -o share.obj -Os
nasm -f elf gcc_help.asm -o gcc_help.obj
ia16-elf-gcc -mtsr -Wl,-Map=share.map share.obj gcc_help.obj -o share.com -li86
Hello @andrewbird,
ia16-elf-gcc -mtsr -Wl,-Map=share.map share.obj gcc_help.obj -o share.com -li86
Yes, this is the way.
You probably now need to put the transient code in .text.startup
, if you have not done so already. I.e.
#define NON_RESIDENT __attribute__((section(".text.startup")))
I hope to further tweak the gcc-ia16
toolchain — maybe add new function and variable attributes — so that there is a documented and flexible way to specify variables and functions as being transient. Meanwhile, the above should work.
Thank you!
Hello @tkchia,
#define NON_RESIDENT __attribute__((section(".text.startup")))
Yes I had already done this and it seemed to be steering those functions into the non-resident area just fine. I've attached the map file so you can see, but it seems there must be something else swelling the resident sections. share.zip
So since you now use .text.startup as the section to discard in the tsr script, does that mean you can mark certain library functions that you know are startup / end code like crtbegin, crtend, exit, __call_exitprocs as section .text.startup and in the normal linker script they will be linked into the .text section by wildcard rule .text.*?
I hope to further tweak the gcc-ia16 toolchain
Sure thing, I'll test as they arrive into the PPA
Thank you!
Hello @tkchia , Am I correct in saying that malloc() allocations start at heap_end_minimum? If so that may be the problem, as the share loader figures out where the end of what's supposed to be resident is by mallocing a final byte after all other allocations are done and then measuring the distance from the PSP to it. On the custom ld script script the heap_end_minimum is 0x12a0, whereas with the new tsr support it's 0x16f0. Other than that the .text, .data, bss section sizes look very similar to those produced by the custom script.
Thank you!
Hello @andrewbird,
Ah — I think I see the problem. Your custom script placed the heap start right after the resident portion, while my script was placing it at the end of the transient portion. (I had considered treating the heap as part of the resident portion, but this is tricky to get right in the general case.)
Let me try to come up with a good way for the C runtime to report the end of the resident portion. I will let you know.
Thank you!
Hello @tkchia ,
while my script was placing it at the end of the transient portion.
But doesn't that mean if we want to release the transient portion we can never use malloc'd memory afterwards? In the share program malloc'd memory is used to allocate the share and lock tables that will be used by the resident code. Their sizes are not known at compile time as they are determined by command line parameters.
Let me try to come up with a good way for the C runtime to report the end of the resident portion.
I'm not sure if that's too helpful on its own, but if we could redefine the location of the heap at runtime it could be?
I hope I'm not asking too many silly questions, this level of control on placing memory in a C program is both new and interesting to me.
Thank you!
Can this work?
. | high |
---|---|
data.startup | |
text.startup | |
data | |
text | |
. | low |
. | high |
---|---|
bss.startup | |
data.startup | |
text.startup | |
stack.startup | |
. | |
heap | |
bss | |
data | |
text | |
. | low |
. | high |
---|---|
stack | |
. | |
heap | |
bss | |
data | |
text | |
. | low |
Hello @andrewbird,
In the share program malloc'd memory is used to allocate the share and lock tables that will be used by the resident code. Their sizes are not known at compile time as they are determined by command line parameters.
Hmm — this is a problem. 😐
One problem with moving .text.startup
etc. above the heap, is that the maximum heap size, whether it is 32 KiB, or 48 KiB, or 60 KiB, etc. will probably need to be hard-coded in advance, so that we know where exactly to position .text.startup
etc.
It will be nice if we can automatically position the transient code and data according to their size — if the transient portion is large we probably want it to start lower in memory. But it seems the GNU linker scripts do not allow one to position sections "backwards" in memory — e.g. I cannot easily say "allocate a .bss.startup
output section in such a way that the section ends at offset 0xff00
or thereabouts".
Thank you!
Hello @andrewbird and @tkchia,
(I had considered treating the heap as part of the resident portion, but this is tricky to get right in the general case.)
May I ask why is this tricky, if the heap is of fixed size? A fixed heap directly after the resident portion, then followed by the transient .text.start/.data.start sections should solve this problem and eliminate the requirement for a custom linker script, right?
Let me try to come up with a good way for the C runtime to report the end of the resident portion.
Wouldn't malloc(1)
, after all other allocations are completed, return the end of the resident data section, which, along with some manipulation with DS and CS values, allow the PSP offset to be known?
Thank you!
Hello @ghaerr, hello @andrewbird,
I am experimenting with some changes to gcc-ia16
and newlib-ia16
that will allow the heap to appear before the transient code and data, but in a more rigorous way (https://github.com/tkchia/newlib-ia16/commit/5c10cbfe9baffadeb31a6a81f09de4f40067b54d etc.). Thank you!
But it seems the GNU linker scripts do not allow one to position sections "backwards" in memory — e.g. I cannot easily say "allocate a .bss.startup output section in such a way that the section ends at offset 0xff00 or thereabouts".
I'm not sure you need to worry about that as such since it would be NOLOAD, hence not allocated, and the AT final address is computable.
So wouldn't this work, it's similar to my gcc_help.ld but extended to handle the extra .data.startup and .bss.startup?
section .text {
...
}
section .data {
...
edata - .
}
section .bss NOLOAD {
/* NOLOAD means it can overlap with text.startup and maybe .data.startup */
}
/* this is where heap ends up, its location and size are not important */
section .text.startup AT (0xff00 - sizeof(.bss.startup) - sizeof(.data.startup) - sizeof(.text.startup)) {
/* load address starts at .edata */
...
}
section .data.startup AT (0xff00 - sizeof(.bss.startup) - sizeof(.data.startup)) {
...
}
section .bss.startup (NOLOAD) AT (0xff00 - sizeof(.bss.startup)) {
/* again it's not allocated, only cleared */
}
section .stack.startup (NOLOAD) AT (0xff00) {
/* again it's not allocated, and only shown here for reference 0xff00 -> 0xffff */
}
Or did I miss something obvious?
I just tried this and it seems to give me what I expect (although SIZEOF didn't seem to work, end of section - start of section did)
.text.startup (0xfd00 - __xx_bss_startup_length - __xx_data_startup_length - __xx_text_startup_length) :
AT (__sbss_keep) {
...
}
__xx_text_startup_length = (__etext_startup - __stext_startup);
.data.startup (0xfd00 - __xx_bss_startup_length - __xx_data_startup_length) :
AT (__sbss_keep + SIZEOF (.text.startup)) {
...
}
__xx_data_startup_length = (__edata_startup - __sdata_startup);
.bss.startup (0xfd00 - __xx_bss_startup_length) (NOLOAD) : {
...
}
__xx_bss_startup_length = (__ebss_startup - __sbss_startup);
And the .MAP file looked like this
.bss 0x00000000000011c4 0xb8
0x00000000000011c4 __sbss_keep = .
.text.startup 0x000000000000f88a 0x461 load address 0x00000000000011c4
0x00000000000011c4 __stext_startup_load = LOADADDR (.text.startup)
.data.startup 0x000000000000fceb 0xd load address 0x0000000000001625
0x0000000000001625 __sdata_startup_load = LOADADDR (.data.startup)
.bss.startup 0x000000000000fcf7 0x9
0x000000000000fcf7 __sbss_startup = .
Thank you!
Hello @andrewbird,
This is very cool, thanks! I will update the Newlib package in the PPA to use this.
Hello @andrewbird, hello @ghaerr,
By the way...
Wouldn't
malloc(1)
, after all other allocations are completed, return the end of the resident data section, which, along with some manipulation with DS and CS values, allow the PSP offset to be known?
Actually sbrk (0)
might be simpler and work better. If there are unallocated "holes" in the heap — perhaps left over by earlier <stdio.h>
operations etc. — then malloc (1)
may allocate from these holes, and will not return the true end of allocated heap memory. So I think sbrk (0)
would be better.
Thank you!
Hello @tkchia, That's an interesting observation about malloc(1) and holes. So sbrk(0) might be a way of reducing code size slightly as well, if it doesn't cost too much to include another library call into the resident text.
I just got the PPA update so I'll check out your latest work.
Thank you!
Hello @tkchia and @andrewbird,
If there are unallocated "holes" in the heap — perhaps left over by earlier
operations etc. — then malloc (1) may allocate from these holes
Thank you. I hadn't considered the possibility of stdio's allocation of buffers for stdout/stdin in this case. Depending on the implementation, many will dynamically allocate upon first use of I/O, although some have pre-allocated static buffers to alleviate this. Yet others allow static FILE definitions to override dynamic allocation. In any case, sbrk(0)
is superior.
So sbrk(0) might be a way of reducing code size slightly as well
If your resident program only requires allocating memory, never to be freed, using sbrk
could save dragging in the malloc
code.
However, many libc routines use malloc
, including printf
as described above, depending on the particular libc implementation.
For instance, in the ELKS operating system the init
program is always resident, so the following code included in init.c works to reduce the stdin buffer from 1024 to 1 byte, and the stdout and stderr buffer sizes from 1024 to 64 bytes, and have them pre-allocated. In this case, malloc
is still dragged in, but not called. Eliminating printf
entirely would eliminate the need for this, but wasn't an good option.
static unsigned char bufin[1];
FILE stdin[1] =
{
{
bufin,
bufin,
bufin,
bufin,
bufin + sizeof(bufin),
0,
_IOFBF | __MODE_READ | __MODE_IOTRAN
}
};
static unsigned char bufout[64];
FILE stdout[1] =
{
{
bufout,
bufout,
bufout,
bufout,
bufout + sizeof(bufout),
1,
_IOFBF | __MODE_WRITE | __MODE_IOTRAN
}
};
static unsigned char buferr[64];
FILE stderr[1] =
{
{
buferr,
buferr,
buferr,
buferr,
buferr + sizeof(buferr),
2,
_IONBF | __MODE_WRITE | __MODE_IOTRAN
}
};
Thank you!
Hello @tkchia, @ghaerr,
It's looking really good now https://github.com/andrewbird/share/tree/new-ld-tsr-support-02
Type | Turbo C | Custom | New TSR support |
---|---|---|---|
On disk size (bytes) | 6416 | 5416 | 5400 |
Resident size (paragraphs) | 0x246 | 0x01c0 | 0x1aa |
So as you might tell, I'm very happy with the results. Here's the .MAP file share.zip, looking down the list of allocations in .text, .data and .bss the abort stuff stands out. Would that be a candidate to be marked as part of the *.exit sections?
Thank you!
Hello @andrewbird,
I looked at the share.c source and now see what's going on. Since printf
is already removed, and malloc
called only for the file and lock tables, it seems using sbrk
for those allocations would allow for malloc
(actually nano-malloc and nano-free) to disappear, which should save 0x24+0x148+0xc9 bytes according to your mapfile. This could be done for the TurboC case as well, and could eliminate the #ifdefs and unneeded free
call.
Thank you!
Hello @ghaerr,
I'm not familiar with sbrk() (I never heard of it before @tkchia mentioned it), but I did a quick try with it for gcc only, see https://github.com/andrewbird/share/tree/new-ld-tsr-support-02. I may have done something stupid as I still see free/malloc used in the .MAP. The filesize was reduced (5338) but the resident size change was minimal (0x1a7).
Thank you!
Hello @andrewbird,
I may have done something stupid as I still see free/malloc used in the .MAP.
I checked the .map
file — it turns out that there are some infelicities in the Newlib library that is causing _free_r
etc. to be needlessly roped in. I will try to fix these; you should not need to do anything at your end.
(If you are curious: the top part of the .map
file gives some indication of which particular library modules were being included into the final .exe
to resolve which particular symbols. According to it,
_free_r
function was requested by the lib_a-reent.o
module (reent.c
)lib_a-reent.o
in turn got roped in because it defined an errno
symbol, which _write_r
needs (this errno
is actually not the user-visible errno
, but rather an internal variable used by the system-dependent layer).So I guess one way to avoid linking in the malloc
stuff, would be to modify Newlib to separate out the internal errno
into a separate module. Alternatively, I might be able to get rid of the internal errno
variable altogether, so that there is always only a thread-local errno
to deal with.)
Thank you!
Hello @tkchia,
Nice noticing how the system-dependent errno
dragged in lib_a-reent.o
, causing free_r
to get roped in. I notice that sbrk_r
also uses errno
, so it's not just write_r
bringing it in.
What I haven't figured out yet is why malloc
is getting dragged in. The nano-mallocr.c source is compiled into separate modules (malloc/free/realloc, etc), and the reent cleanup code doesn't call it nor does free
... where's it coming from?
Thank you!
Hello @ghaerr,
The nano-mallocr.c source is compiled into separate modules (malloc/free/realloc, etc), and the reent cleanup code doesn't call it nor does
free
... where's it coming from?
Apparently it got linked in because _free_r
was referencing __malloc_free_list
, which was defined in the same module as _malloc_r
(argh). Thank you!
Hello @tkchia, hello @ghaerr,
I checked the .map file — it turns out that there are some infelicities in the Newlib library that is causing _free_r etc. to be needlessly roped in. I will try to fix these; you should not need to do anything at your end.
Well you both seem to have a good grip on what's going on, I do appreciate the effort you are putting into my little quest.
I'll look out for the updated PPA package, and post my results shortly afterwards.
Thank you!
Hello @tkchia, i just got your latest update from the PPA, it's looking really good!
Type | Turbo C | Custom | Latest TSR support |
---|---|---|---|
On disk size (bytes) | 6416 | 5416 | 3934 |
Resident size (paragraphs) | 0x246 | 0x1c0 | 0x14a |
So as you can see the filesize decreased significantly and the resident size reduced by almost 4k bytes over the Turbo C compiled version! As I already mentioned, I'm very happy with the results. Do you think there's any more scope for improvement? I've attached the new .map file, but the only thing that looks out of place now in the .text section is __call_exitprocs() etc.
Thank you!
Hello @andrewbird and @tkchia,
I'm very happy with the results. Do you think there's any more scope for improvement?
Looking at the most recent .map file, I'd say that, thanks to @tkchia's work untangling parts of newlib from itself and your work with the linker script and rewriting share.c using sbrk
, that there's not much more reasonably obtainable gains to be had. The exit-processing is usually pretty tied into proper libc and process termination handling which share.com sometimes needs, and there's only around 300 bytes in that code.
I've enjoyed learning more about newlib from this thread, and its nice to see gcc-ia16 and its C library outdoing TurboC by quite a bit :)
Thank you!
Hello @ghaerr and @tkchia,
The exit-processing is usually pretty tied into proper libc and process termination handling which share.com sometimes needs, and there's only around 300 bytes in that code.
Yes, I had assumed that we probably need that before _dos_keep()
gets called, but if it gets run somehow afterwards I'm not sure what's would happen except that it's probably very bad. For that reason I was wondering if it could move its section from .text
to .text.exit
and in that way it would be present for any legitimate calls, but be non-resident after _dos_keep()
? That being said, I agree that releasing another 300 bytes is probably not worth the effort and what you've both done here has far exceeded my aspirations!
I did just try adding the following to the linker script
NOCROSSREFS_TO(".text.startup", ".text")
NOCROSSREFS_TO(".data.startup", ".text")
NOCROSSREFS_TO(".bss.startup", ".text")
And I got one error, but I'm not sure if it's anything to worry about?
ia16-elf-gcc -mtsr -Wall -fpack-struct -mcmodel=tiny -c share.c -o share.obj -Os
nasm -f elf gcc_help.asm -o gcc_help.obj
ia16-elf-gcc -mtsr -o share.com share.obj gcc_help.obj -Wl,-Map=share.map -li86
/usr/lib/x86_64-linux-gnu/gcc/ia16-elf/6.3.0/../../../../../ia16-elf/bin/ld: /usr/lib/x86_64-linux-gnu/gcc/ia16-elf/6.3.0/crtend.o: in function `__do_global_ctors_aux':
(.text+0x1): prohibited cross reference from .text to `.ctors' in .data.startup
/usr/lib/x86_64-linux-gnu/gcc/ia16-elf/6.3.0/../../../../../ia16-elf/bin/ld: (.text+0xa): prohibited cross reference from .text to `.ctors' in .data.startup
collect2: error: ld returned 1 exit status
make: *** [Makefile:2: share.com] Error 1
I've enjoyed learning more about newlib from this thread, and its nice to see gcc-ia16 and its C library outdoing TurboC by quite a bit :)
These are my sentiments exactly, a most enjoyable experience! I'll see about tidying up my branch and raising a PR to FD/share as it is now.
Thank you!
Hello @andrewbird,
I've attached the new .map file, but the only thing that looks out of place now in the .text section is __call_exitprocs() etc.
Yes, I hope to find a neat way to move __call_exitprocs ()
, abort ()
, etc. to .text.exit
, at some point in time. Thank you!
Hello @tkchia,
Yes, I hope to find a neat way to move __call_exitprocs (), abort (), etc. to .text.exit, at some point in time.
That's very good news. I'm already very happy with the result.
Thank you!
Hello @tkchia, I'm thinking about adding localisation support to FreeDOS share. I want to put the kitten library into the *.startup sections. I have it working, but is there a better way than this?
kitten.obj: ../kitten/kitten.c
+ifeq "$(COMPILER)" "gcc"
+ $(CC) $(_CFLAGS)$@.1 $^
+ objcopy --rename-section .text=.text.startup \
+ --rename-section .data=.data.startup \
+ --rename-section .rodata=.rodata.startup \
+ --rename-section .bss=.bss.startup $@.1 $@
+else
$(CC) $(_CFLAGS)$@ $^
+endif
Thank you!
Hello @tkchia, It seems that my section renaming, although steering things well to the transient portion, messes up file global data. I'm not sure if it's the initialisation (zero) of the data or when it's accessed that's failing, If I add the proper attribute then the problem goes away.
The problem definition is this
-static nl_catd _kitten_catalog = 0; /* _kitten_catalog descriptor or 0 */
+static nl_catd _kitten_catalog __attribute__((section(".data.startup"))) = 0; /* _kitten_catalog descriptor or 0 */
I guess the objcopy hack is just that.
Thank you!
Hello @tkchia, I'm still trying to track this problem down with my use of section renaming. I made a short test case t1.zip, I wonder if you would be so kind as to just have a quick look at the map file and see if it's what you expect to see, as the whole load address thing somewhat confuses me? Of course the test case works fine, but in the real program the variable that's in the bss.startup has a non zero value at startup.
Thank you!
Hello @andrewbird,
Hmm... this is weird. The length of the transient BSS section (in __lbss_nokeep
) is calculated as 0. This means that for some reason the mybss
variable is not allocated space — or it is, but somehow it is made to overlap with something else.
I am not sure why the linker is doing this, and in any case this is probably a linker script (or linker) bug.
Thank you!
Hello @andrewbird,
I wonder if you would be so kind as to just have a quick look at the map file and see if it's what you expect to see, as the whole load address thing somewhat confuses me?
Hmm... this is weird. The length of the transient BSS section (in __lbss_nokeep) is calculated as 0.
This particular issue should be fixed now (https://github.com/tkchia/newlib-ia16/commit/03a53fa3090b98941a5d2cf88c77e8ce4110cbef).
Anyway, for flat .com
files, load addresses are used for their "original" purpose — to say that a section is placed at program startup at a particular address, but will have to be moved to another address before its contents are actually used.
Thank you!
I'm looking at getting gcc-ia16 to cross compile the share.com TSR for FreeDOS. Currently the only compiler capable of building it is Turbo C 2.01. I've had some initial success with GCC using a local keep() function, see https://github.com/andrewbird/share/commit/b94094af4b5bed6efbafb6ee10a38106f7cadfb8 however the original code does some tricks to throw away the initialisation code and shrink the resident memory size. I'm wondering if I could achieve something similar or better using a custom linker script? From what I see in the memory map file, gcc-ia16 lays out the startup code, init and fini before the text of the program and consequently would still be included in the resident memory at TSR. I think that ideally I'd want something like this for tiny model .COM
I've never had cause to write a linker script before, so I am a complete novice. Is this sort of control possible? Is there a better solution?
Also I'm using this command line, how would I replicate that as separate compile / link ops?