michael105 / minilib

A c standard system library with a focus on size, headeronly, "singlefile", intended for static linking. 187 Bytes for "Hello World"(regular elf), compiled with the standard gcc toolchain.
Other
37 stars 1 forks source link

Log #10

Open michael105 opened 5 years ago

michael105 commented 5 years ago

I think I'm going to log here from time to time about the development. Just now, I'm a bit upset. This might be funny for most of you. But I'm really concerned about bloating. A few bytes might not seem so much.

But .. I added a global structure. Suddenly - even the tiniest "hello world" bloated up to 4k. And, you can feel the difference, it simply loads a few microseconds longer, when executing. It's feelable. And, if you think a bit bigger, multiply this with a few billions for a server - this would also count in hard cash.

So. Atm. I'm going to think about it. There has to be a compromise. But the last compromise, the global structure, initialized at start - I believe, it doesn't need to bloat so much.

Maybe something like a lazy initializer would be better. Sort of mallocing the global structure and buffer, only when needed.

michael105 commented 5 years ago

It's also a question of the increasing complexity. Although it's a miniature lib, already adding something here automatically changes something there... You know the game. I never thought so much about simplicity like now. It's just an unusual programming style, in nowadays world of Java and whatelse object orientational ways. You just don't think this way, normally. Even when system programming. Even at assembler level. So - it's also much fun, implementing this minilib. Just because it's unusual. And there is much abstraction to do as well - only in a strange way.

michael105 commented 5 years ago

Ok. Current state (amd64, linux): hello-world: 185B. that's Ok again. It has been down to 150Bytes before, but at the moment these 35Bytes are not the highest priority.

A sort of malloc is implemented.

michael105 commented 5 years ago

Darn! I really do need a decent disassembler. elftools and binutils don't even read the binaries. (I guess, because I'm stripping all sectionheaders; but what do we need section headers for, when there's only the section text..)

michael105 commented 5 years ago

Ah. ;) There's an online disassembler ( https://onlinedisassembler.com/odaweb/ ) Got my hexdump: hexdump hello-include | perl -pe 's/^\S*//' Output given bellow..

michael105 commented 5 years ago

hexdump hello-include | perl -pe 's/^\S*//' 457f 464c 0102 0001 0000 0000 0000 0000 0002 003e 0001 0000 8090 0804 0000 0000 0040 0000 0000 0000 0000 0000 0000 0000 0000 0000 0040 0038 0001 0000 0000 0000 0001 0000 0005 0000 0000 0000 0000 0000 8000 0804 0000 0000 8000 0804 0000 0000 00b9 0000 0000 0000 00b9 0000 0000 0000 0001 0000 0000 0000 01b8 0000 4800 358d 0027 0000 0dba 0000 8900 0fc7 3105 c3c0 485f e689 8d48 fe54 e808 ffda ffff 8948 48c7 c0c7 003c 0000 050f 48c3 6c65 6f6c 7720 726f 646c 0a21 0000

michael105 commented 5 years ago

finally. Compiled with other options, objdump did its job. objdump -D hello-include

hello-include: file format elf64-x86-64

Disassembly of section .text:

0000000008048078 <.text>: 8048078: b8 01 00 00 00 mov $0x1,%eax 804807d: 48 8d 35 27 00 00 00 lea 0x27(%rip),%rsi # 0x80480ab 8048084: ba 0d 00 00 00 mov $0xd,%edx 8048089: 89 c7 mov %eax,%edi 804808b: 0f 05 syscall 804808d: 31 c0 xor %eax,%eax 804808f: c3 retq
8048090: 5f pop %rdi 8048091: 48 89 e6 mov %rsp,%rsi 8048094: 48 8d 54 fe 08 lea 0x8(%rsi,%rdi,8),%rdx 8048099: e8 da ff ff ff callq 0x8048078 804809e: 48 89 c7 mov %rax,%rdi 80480a1: 48 c7 c0 3c 00 00 00 mov $0x3c,%rax 80480a8: 0f 05 syscall 80480aa: c3 retq
80480ab: 48 rex.W 80480ac: 65 6c gs insb (%dx),%es:(%rdi) 80480ae: 6c insb (%dx),%es:(%rdi) 80480af: 6f outsl %ds:(%rsi),(%dx) 80480b0: 20 77 6f and %dh,0x6f(%rdi) 80480b3: 72 6c jb 0x8048121 80480b5: 64 21 0a and %ecx,%fs:(%rdx)

michael105 commented 5 years ago

Now, I'm really wondering why the heck ld places the entry point behind the main function. That's .. hm. ( main starts in the listing at ..oh gosh. anyways. its at the top - 8048078 ) the entry point is at 8048090 so - we call main from 8048099, just to return then from 804808f to 804809e This is. Ok. Finally we do a call from _start to main. I haven't figured out yet, how to avoid this call. Just putting a _start before main, and _end after would be faster and save again a few bytes.

michael105 commented 5 years ago

an _end function behind main would not only be faster and save a few bytes, it would be saver. (No crash, when someone does to much fiddling within main. Just return to the os with a perhaps unusual value. While writing: just need to push the address of _end to the stack, just before entering main. No call to main. So, the ret within main will do the jump to _end. Ideally just one byte, if ret is callen at the end of main

michael105 commented 5 years ago

And again, while thinking about it: I'm going to try a jmp to main at the end of _start. Hopefully, the compiler will optimize right.

michael105 commented 5 years ago

Nice reading: https://en.wikibooks.org/wiki/X86_Disassembly

michael105 commented 5 years ago

That's typical. The linked wikibook - most things are quite basical. The things, I'd need to know - aren't explained.

michael105 commented 5 years ago

Anyways, nice sentence there: Computer science professors tell their students to avoid jumps and goto instructions, to avoid the proverbial "spaghetti code." Unfortunately, assembly only has jump instructions to control program flow. grin. Obviously, that's right. And, as I always said: A good programming language shouldn't restrict you. I don't get, why most languages have banned the good old goto. If the code I produce is bad - it's about the way I think. A programming language is nothing more than a sort of education for your thoughts. And a restrictive education is not the best education.

michael105 commented 5 years ago

..without any problem we could write object oriented programs in Assembly. or functional. It's just hard to write lowlevel from a highfunctional language, the other way. Cause of restrictions. I personally really do love perl cause of this. Simply, because there aren't any imposed restrictions. (Not before 5.10)

michael105 commented 5 years ago

And the biggest perl project I wrote did have around 50.000 loc. Nicely, there didn't be a real performance bash. Just the startup took a few seconds, finally the scripting expressions have to get compiled. After this, a few critical routines I benchmarked - Rewriting them in c most times didn't give a real performance gain. Rewriting in Assembly has even be sort of critical. Sometimes there has been a performance gain of around the factor two, but when not really careful, often there's been a serious penalty. My conclusion has been - just keep writing in perl. It's very seldom, a factor of two gives you a real performance gain. Even in a high performance environment. The problems are in nearly all cases at another place.

michael105 commented 5 years ago

Ok. Now - back to coding. I guess, I really have an important milestone nearly accomplished - Having a good base, and sorted things out. It might soon be possible to use minilib as a plugin replacement for most tools. Just by changing the gcc compiler switches. No code change needed.

Oh. And for now, I should go shopping. Need something to eat, and the malls are closing within an hour here. Maybe I should even buy some drinks, I somehow have the feeling, I can celebrate the development (in its positive sense) of minilib. Why not praise myself ;)

michael105 commented 5 years ago

And, although I have other important projects - I really do want to accomplish the milestone 0.1 - a solid basic structure. Getting the whole thing to compile on further architectures, adding more ansi c functions, is not so complicated then. but could mark 0.2 0.3 would then be a complete ansi-c set. Just to point out a roadmap

rofl0r commented 5 years ago

(you might be interested in https://github.com/arsv/minibase which offers a built-in "libc" using similar approach)

michael105 commented 5 years ago

(you might be interested in https://github.com/arsv/minibase which offers a built-in "libc" using similar approach)

Thanks a lot, that really is a good hint. Somehow didn't find it. ;) Although it even has a similar name.

I have to take a close look. It seems to me, minibase has a different target. But that's only a feeling yet. I have to think about the similarities and differences. One difference might be my approach of a header only library. (More exact, the possibility via compile switch) Also, the licenses differ. Which, perhaps, is not only a philosophical question.

It's a great reference anyways. And there seem to be further similarities, sort of a basic Linux system I'm also planning. Although this I'm going to clearly separate from minilib.

Likewise, this seems to me a philosophical question: Which way ist better, the monolithic, or the micro(lithic) approach.

I guess we somehow haven't realized yet, what the modern information technology could change. And although one might think, the monolithic approach is more stable - This doesn't seem to prove.

@rofl0r , may I ask, why did you point out the ability of hardcore-utils to be built standalone? This seems to me right and important, but I'm not able to pinpoint the reason exactly. There seem to be many reasons which speak for both sides - monolithic and micro approach. As I pointed out in the readme, security seems to me a reason for a micro approach. As well as simplicity. But there still have compromises to be found, so it's sort of hard to define. Other reasons are less complexity (good) and more stability. ( Not everything is broken, when something breaks. Only the affected micro part). But.. hard to sort this out. Possibly, because it's hard to define,
where the border between a monolithic and a microlithic approach exactly is.

Anyways, thanks for the good hint. Best wishes, Michael

rofl0r commented 5 years ago

@rofl0r , may I ask, why did you point out the ability of hardcore-utils to be built standalone?

imo, it's way more convenient for both building and debugging: gcc foo.c -o foo et voila. i like busybox' approach too, but it's quite hard to get a debuggable build and find a proper entrypoint for debugging (though i guess in all fairness this could be counted as a quirk of the build system). the highly integrated approach of busybox also makes it relatively hard to study the source code, and the wall of ifdefs for minimal size/option tweaks makes it even worse. having the whole set of unix tools in a single ~800KB'ish executable is a really nice property, but otoh i don't really care whether my join program is a 30 KB executable vs adding only 10 KB to busybox, when i need to install 100+MB of libs and binaries for a webbrowser.

michael105 commented 5 years ago

@rofl0r , may I ask, why did you point out the ability of hardcore-utils to be built standalone?

imo, it's way more convenient for both building and debugging: gcc foo.c -o foo et voila. That's sort of two-edged. The build itself might be more convenient with a single monolithic source file - IF everything works out. Debugging,..

i like busybox' approach too, but it's quite hard to get a debuggable build and find a proper entrypoint for debugging (though i guess in all fairness this could be counted as a quirk of the build system). the highly integrated approach of busybox also makes it relatively hard to study the source code, and the wall of ifdefs for minimal size/option tweaks makes it even worse.

I believe that's one of the important points. Just today I thought about the linux kernel - Although the sources are there, on my harddisk - In now way I could read through them. Meaning, the argument of more security by the open source is more or less hypothetical. Even if I would read through - What hides behind this or that macro I can't look up in every case. Oh, and I'm just getting remenbered to perl japhs... To be fair, afaik within the kernel development they separated the different responsibilities quite clear. So there, again, is some sort of micro development.

Possibly that's the real point - decreased complexity by clear targets of each single tool. This on the other hand increases the chances, others are able to contribute. And it's easier to understand, even if it's your own source code. Who can remember, what he did ten years before, without reading the sources. I'm eager to see, whether my approach works out. Having things separated, but compiling the sources into one single source file.

having the whole set of unix tools in a single ~800KB'ish executable is a really nice property, but otoh i don't really care whether my join program is a 30 KB executable vs adding only 10 KB to busybox, when i need to install 100+MB of libs and binaries for a webbrowser.

Strangely, having statically compiled binaries somehow seem to be more responsive. Shorter loading and execution times. Although that's more a sort of a feeling. I'm not so firm, but I guess it might have something to with context changes. When the libraries are loaded somewhere in the big ram of nowadays, and the program into quite another part - better have the whole executable loaded into the processors cache, than having to load this or that part of several libraries from the ram. Which might lead to further penalties, like broken cache predictions and so on.

Anyways, again thinking about that. This again seems to be a trade off between abstraction - like a option parser, which might be needed by most tools, so implemented once - and the single tools.

It's a bit to me, like I know a good solution - I even believe I have one - But I'm not able to pinpoint it.

Anyways, you are completely right with compiling and debugging. I busybox doesn't compile, most likely I'm going to try something other. Just to much work getting through the sources. If one single tool with a single source file makes troubles, most possibly I'm going to have a look into the sources.

But it still seems to me, there's one important point missing. Possibly some sort of conjunction of all the separated parts.

rofl0r commented 5 years ago

Strangely, having statically compiled binaries somehow seem to be more responsive. Shorter loading and execution times.

theoretically static linked binaries should be slightly faster for 2 reasons: 1) no delay on startup due to the dynamic linker having to patch jump adresses for the linked libary routines 2) no overhead due to -fPIC. this can make especially register-starved platforms like i386 a good bit faster, because iirc gcc usually keeps the plt address in a register, however this can now largely be mitigated with -fno-plt (see http://ewontfix.com/18/ for more details)

michael105 commented 5 years ago

Strangely, having statically compiled binaries somehow seem to be more responsive. Shorter loading and execution times.

theoretically static linked binaries should be slightly faster for 2 reasons:

1. no delay on startup due to the dynamic linker having to patch jump adresses for the linked libary routines

2. no overhead due to -fPIC. this can make especially register-starved platforms like i386 a good bit faster, because iirc gcc usually keeps the plt address in a register, however this can now largely be mitigated with -fno-plt (see http://ewontfix.com/18/ for more details)

Well, it's not only theoretical. Including me, I'd say, it's sometimes hard to keep an eye on complexity. I remember my assembly experiments, when I tried to improve some basic functions. To my annoyance, sometimes it was really hard to beat the code generated by gcc. Very often, the results have been counterintuitive.

I guess, your first point, combined with the resulting cache misses, can get a bigger problem than one might think.

Your second point, again, is a good hint. Although lucid, I haven't been aware of it.

:laughing: Again, this complexity. Like what I did today (tonight) .. I'm not really sure what it was. But suddenly the "extremely tiny" editor compiled to a bloated something. instead of 15k, 2MB. (!) There still is something wrong, it was down to 8k. Just now I'm thinking I should give it a break, instead of implementing this or that, maybe it's time to think about howto get a grip of the complexity.

The "test" system is a good first step. But obviously not good enough. And not complete at all.

First I'm going to check for the position independent code. afaik gcc doesn't create position independent code, when compiling static. But you never now. Thanks again for the hint

rofl0r commented 5 years ago

Like what I did today (tonight) .. I'm not really sure what it was.

well, this should not happen when you use git which you do. a git diff can always tell you what's been changed since the last checked-in (and thus probably "known good") version. before git came along, it was really hard to remember everything that has changed recently, when a regression happened...

michael105 commented 5 years ago

Yes, it's been exactly what I did. ;) Now I still don't know, what exactly has been able to bloat 2MB into a tiny poor executable of 12k. I'm pretty sure, it's been the linker. :) Someone has to be blamed. But I haven't been able to pinpoint the problem. And since I've to cleanup the whole minilib, I'm better doing this first.

michael105 commented 5 years ago

Obviously sometimes I don't see the obvious. Header only implementations, marked with "always inline" should be static. Else there is going to be trouble. Dunno, why I ripped the static some days before.

The bloating - seems related to the stack. Somehow the global struct forces to place the stack into a separate program header. That is adding around 50 to 60 bytes. Quite expensive for just one variable.

I guess I leave it this way for now, anyways.

michael105 commented 5 years ago

.. seems the extra program header for the stack is only needed, when linking several object files. :thinking:

michael105 commented 5 years ago

:rage:

michael105 commented 5 years ago

Further readings reveal - eventually, it's possible to drop the .text section, instead write all execution instructions into the stack. Also, my point about security seams to be valid. Just found this site: https://blog.fbkcs.ru/en/elf-in-memory-execution/ - There are attacks described, which might be interesting especially at android. I can only guess - but somehow I'm sure, it is not only a theoretical possibility, someone infecting e.g. an android system. I even can't be completely sure, the sudden bloats I'm experiencing sometimes aren't related to an infection here. (Normally I wouldn't notice, but since I'm counting very single byte, ..) I'm working with a quite clean and fresh Arch amd64 installation here. But I'm also browsing the net with the same system. I guess, there is a 5 percent chance, that there is an infection. But I cannot say for surel. the binaries, I uploaded, might be ok. SInce I really count every single byte. Otherwise, it's not so hard disassembling a 200 Byte file and checking exactly. Which is what I'm going to do today with the bloated executables. I'm also getting back to another idea of mine. Having a core system, where everything is statically linked. I already linked e.g. the shell (zsh) statical. It's quite more responsive. But possibly, I should also link gcc, and so on statical. It's also about minimizing the possible problems. Also, when they are more or less hypothetical.

And, again, this seems to me a huge advantage of minilib, compared to glibc or even musl: It's not only not hard, reading and understanding the source of this minilib. It also is possible to disassemble the generated binaries.

michael105 commented 5 years ago

.. Disassembling other binaries obviously is possible at well. But, who can say what's hidden in, eg. an "hello world", which shows up with 500k. Else with an 150Bytes executable, where the biggest part is simply the elf header. One should also keep in mind, that sometimes the disassemblers miss things. Like shifted bytes, or executable instructions within binary "data"

michael105 commented 5 years ago

I'm still engaged with restructuring. Have written a small parser, to create the compat headers. Firstly it works. Secondly, shows up useful. But I made a fundamental design flaw. The header files are generated from templates, and the target files are overwritten. This seemed right in this special usecase. But, since this works out so well, I'd like to use the parser for further jobs. And there's the problem, that you can't modify the created header files directly. Instead you have to modify one of the templates, and rerun "make header". Which is annoying. So I'm heavenly tempted to write a small interpreter for this job.

michael105 commented 5 years ago

The small interpreter for the header files is possibly is the best solution. On one side, I don't like adding further complexity with tools to a project. On the other side - Boy, did I have hard times with the c praeprocessor. And I'm unlucky with some of the solutions, I found. e.g. the DEF_syscall(write,a1,a2) Macros. You simply cannot name the parameters - which is silly. But even this solution was hard to find. Possibly a compromise would be best. A interpreter, creating and modificating the header files - and some macros and so on, which expand at compiletime.

michael105 commented 5 years ago

Still restructuring .. rethinking. Just stumbled upon "Brainfuck". A Interpreted language, which consists of exactly 8 commands.

Which led me to: What's the difference between a functional and a object oriented language? None. The languages just give you some tools, which make it easier to write in this or that style. Without problems I could define structures in C, containing function and data pointers. Therefore being objects, abstraction and do so on might even be easier to develop than in cpp.

So, in the matters of my current thinking about minilib: What's the difference between a praeprocessor and a interpreter? None. ;) Only, the praeprocessor is quite restricted.

So I guess I choose the interpreter alternative.

michael105 commented 5 years ago

Meanwhile .. I guess, I should tag the last working version of minilib. The tmp-branch is really temporary, and won't even compile. It's just my backup solution. And since I'm still restructuring, and I'm used to sort things out as basic as possible.. Better tag it more explicitely. Naming things is good..

michael105 commented 5 years ago

.. ;) So. It's nearly done. The "devel" branch now shows up with a structure, I'd regard good to go for the next time. Again, the restructuring lead to some regressions. But if I didn't oversee some major mistakes, this time it's the matter of sorting out the problems with the interdependencies for the last time. Then the build "system" I wrote should be able to overtake.

Interestingly, the way I'm combining everything into a single header file, today I saw nearly the same implementation. Since my implementation now works, well, I keep it.

The way the config header is created - I've got the feeling, I should start a whole new project for it. atm, it's embedded to closely within minilib. But I already wrote the config generation quite generic.

And I guess, I did some things right. Having one file, where you define, what should be built and what not, and set some config options - And having this file then parsed by sh (or bash), checking therefore for syntaxerrors, mistypes, and so on. The implementation is trivial, but works out wonderful.

Yes, I guess I should definitely start another project for that. It also wouldn't be a problem to write some sort of graphical interface,

Anyways, I'm (again) fixing these damned regressions in minilib now, hopefully for the last time. And I've to add some further things to the "buildsystem", inline documentation and automatic testing builds, to name it.

But that's now a matter of routines, the underlying structures are already there.

Yeah, I'm a little bit satisfied, even proud. Having done completely different things the last years, but still being able to get into the flow, and to produce and even finish things.

Ok. Now I should finish the devel branch, and then merge it into main.

Have a good time, Michael (misc)

michael105 commented 5 years ago

And yes, I'm going to continue with minilib. I'd really like to have a nearly complete ANSI-C, and a subset of POSIX-C.

There are things I'm porting, like the suckless linux tools or a "microperl", I grabbed from the sources.

With the suckless linux tools one can see, this minilib really is useful. Overall, it might shrink down all the binaries to, say, 10 percent of the size compared to a compilation with musl.

And there's the point, one can see, my fiddling with sometimes even one or 2 bytes, it counts up.

michael105 commented 5 years ago

Since the restructuring is mostly done; I'm puzzling a bit further; with the suckless tools for now. It seems, there's not really a generic and practical way, to squeeze the last bytes out without changing the sources. There's a problem with the stack; some binaries simply don't need one, and they need an in-/output buffer. But, how to tell this the linker/compiler in a generic way. And writing a separate config file for each binary is also .. Then there would be the possibility of making the stack executable. ( Like it's already the problem with many of even security relevant programs. The Xserver, e.g., as I read somewhere) But, although the second program header and the code induces around 60-70 Bytes - (which is quite much, when the binary is only 150 Bytes without stack header, and 215Bytes with) endangering security might not be sensible. Even, when the half the linux' binaries ignore the problem of the executable stack. so.. Most probably best bet is to ignore the bloat for now. Hopefully there shows a good way up. Or someone of you could tell me? Overall, I'm still not so deep within these stack related linker problems. ... while writing .. I guess, I found a solution. Just also keep linker and gcc options within the config file. Obviously, that's at least a good and generic way, and opens up more possiblities. Anyways, these stack related bloating troubles do need a solution.

michael105 commented 5 years ago

Thinking about adding simply all those syscalls. Prefixed with sys_ . Since I implemented the syscalls as macros, this wouldn't add bloat at all. But could be handy.

michael105 commented 5 years ago

And reading these lines above again ... atm, hello world is about 215 Bytes. Which is okay, but somehow, I'm really wondering who stuffs all those extra Bytes into my binaries. It's sometimes like; hard to say wtf again added 20 Bytes, sometimes suddenly 2kB. I'd guess, the linker is to blame. Or me, since I'm the one saying him what to do. Ohoh. Now I'm thinking about how complicated a elf linker would be to write. Ok, I'm already shrinking the binaries with the "elfshrinker"... but, to be honest, I didn't get the whole process. I just took a version for 32bit, and modified it, so now it works also with 64bit. The linker. I should in each case read some texts about linking. As I said before, I suspect the sudden bloats might have something to do with the stack, and possibly also with alignment. On the other hand, who says, this gnu ld dinosaurier does it always right. Especially, since I fiddled already quite a bit with the linker scripts, to squeeze here some Bytes, then there.. Just now I'm thinking about whether to write the fread / fwrite functions, before going to bed. Most of the implemented functions can be seen in the branch "devel", in the file minilib.conf. Sort of soothing, seeing something grow. Even although I did some heavenly restructuring work the last days. There's testing needed, and there are on or two points I still like to change. But overall, I believe it's a good progress. Best wishes, misc

michael105 commented 5 years ago

Have to push it a bit. My other life needs me. atm, I'm still working these damn bloats out. A little bit of reading lets me guess, it's about the data segment. I really sometimes should RTFM... Although in this case it's the famous book "linkers and loaders", so I should have RTFLaL.. Very well written, by the way. I'm going to merge all changes from devel into master these days, Concerning the documentation in master, some parts slept through from devel. Besides, I changed the build- and config framework. So, when you are interested in the minilib, better checkout "devel".

michael105 commented 5 years ago

Reading further.Possibly that's the point to do another fundamental design decision. It would (possibly) be an advantage, to store the global buffer, as well as the pseudoallocated ram within the stack. This could be also more performant. On the other hand; there's the well known stack overflow. Obviously, a overflow can happen in every section. But doing harm might be much easier, when you can write to the stack. Placing the buffer at the end of a section might be quite a advantage in security, although possibly a disadvantage in performance. Then I still see the possibility, to make the .text section read/write, and place the buffer at the end. This could again spare a few bytes. But the security .. It's the question, whether a attack writing directly to the .text section is only theoretical, because if there's a buffer overflow happening, there are other methods to gain control.?.. Somehow I've got the feeling, placing the buffer at the end of the data section might be the right way. Or into the bss section. This time I'm propagating a little bit of bloat, in favor of security. Although the "bloat" is, well, it's the added section header, it's the parsing at load time, and the addressing overhead. I don't know enough to be able to say, which processor and data bus optimizations there are. It's thinkable, the optimizations can work only when choosing the intuitively "wrong" way. Like I experienced, when playing with assembler optimizations, several times quite unexpected benchmark results.

michael105 commented 5 years ago

Having read more.. I'm wondering. Why is the call stack located at the stack? .. That's simply.. stupid. Why grows the stack downwards at all? .. Thinking about it, the historical explanation is obvious. But since nowadays the whole memory is virtualized, this really is stupid. In each matter. Security, performance, even simplicity.

michael105 commented 5 years ago

I have to read a bit about the optimizing of the cache lines. There still is the point, since nowadays the instructions aren't computed "seriell" anymore, since there's branch prediction, cache prefetch, pipelining and what else .. it's hard to foresay results of changes at the os level, I'm working at. But possibly I really got some important points, which would be really useful in the matter of security, without being too much of a performance penalty.

michael105 commented 5 years ago

Again, it seems to me I found a quite important point in comment 42. Having recherched a bit, as very often, the idea of separating call stack and data stack isn't new. There is the Harvard architecture, e.g. But, as I red, normally the problem is about the already existing programs and libraries, which can't be changed. So, this is a real advantage of minilib. There is at least the possibility, to change this problematic behaviour, since the lib is located at the fundament.

Atm I'm thinking of (optional) macros for calling and returning from a function. This could even be transparently implemented. The macros maintain it's own callstack, but use the stack as usual for data allocation.

This obviously has a price, I'm going to do some benchmarking. But could benefit the security so much, and it's also not sure how much the performance suffers, it even also could be just an option - I guess I'm going to implement this.

I'm going to copy a bit of an discussion about callstack and datastack below.

michael105 commented 5 years ago

Source: security.stackexchange.com/questions/96247/is-using-separate-stacks-for-return-addresses-and-function-arguments-a-viable-se

(citing)


"As far as I know, many exploits rely on overwriting return address of the function they try to exploit. They do it by buffer overruns. But what if the compiler set up two separate stacks far from each other in the address space, and used one of them (maybe the system one, like esp/rsp-driven one on x86* systems) for return addresses and the other to pass function arguments? In this case any buffer overruns will overwrite some locals, maybe including those of caller functions, but still will leave the return addresses intact." [...]

(One of the answers) "The problem with most operating systems is that they follow a specific "calling convention." This convention requires putting function parameters on the stack, being some derivative of the C-style calling convention. You must use this convention for ABI (Application Binary Interface) compatibility with that OS. So, without OS support, you could only use this feature for calls made within your application.

This would complicate compilers quite a bit and probably require a fair amount of work. In short, you could protect your own programs if you had a compiler that supported this calling convention, but you'd still be at the mercy of the OS whenever you had to do things like reading/writing a file, etc. A buffer overrun in a DLL, for example, can't be fixed by you changing your calling convention.

Secondly, until recently, with the advent of virtualization, it really wasn't feasible to set up a separate area like this, because segmentation was expensive, and memory virtualization even more so. Today, this would practically be a non-issue, but since we have to deal with historical software (e.g. stuff written ten years ago that still require the conventional calling methods), the OS would then be forced to support both models for some indefinite period of time.

If a new OS with no compatibility concerns were written, it could certainly do this, but it probably won't happen, because there are more viable methods. Microsoft's own Singularity OS is completely immune to buffer overruns (according to them), because the OS statically validates that each program cannot possibly misbehave. Interestingly, this OS uses no "memory protection" as used by Windows, Linux, Mac OS, etc. The programs are validated for correct behavior before they run, not as they run. Of course, if a virus were capable for this system, it would have unlimited system control because of the lack of protection at the hardware level." [...]


citation ends.

So . Grin. As always. Microsoft doing strange things. And, seems to me, my points are valid.

michael105 commented 5 years ago

Also, one of the points of the cited answer simply isn't right. (If I don't overlook something important..) The calling convention of the os isn't the problem. The main problem is with the already executed function calls, since they stored their return addresses in the stack, upwards. Which can be overwritten by an overflow.

michael105 commented 5 years ago

Oh. Maybe, I read the answer somehow wrong - the author didn't mean the os' abi, he addresses the libc abi. Which I already implemented within the minilib, obviously. So. I'm going to give it a try.

michael105 commented 5 years ago

WTF. lol. error messages from one and the same compile run. Fail!!! :rofl: What is gcc trying to say? Do not mess with me?? That's really outstanding awkward. Hey, you use ret, but didn't define it, because, hey, why are you defining ret?? Stupid thing. But hey, what's inline assembly for..

funcs2.c: In function 'func2': funcs2.c:18:3: error: label 'ret' used but not defined goto ret; ^~~~ funcs2.c: In function 'main': funcs2.c:38:3: warning: label 'ret' defined but not used [-Wunused-label] ret: ^~~ funcs2.c:35:3: error: label 'func2' used but not defined goto func2; ^~~~ Error. Failed command: gcc

michael105 commented 5 years ago

I admit, I'm really bending the limits of C right now. Albeit I cannot feel guilty. But it's one of the reasons I love perl, there simply aren't any limitations by design...

michael105 commented 5 years ago

:rofl:

More interesting outcomes.. The output of the code below is .. unexpected. Though, obviously, right. (I compiled this with the wrapper script mini-gcc, so: mini-gcc func2.c, from the branch devel...)


#define mini_start 
#define mini_puts
#define mini_itodec
#define mini_printf
#define mini_buf 1024
#define INCLUDESRC

#include "minilib/minilib.h"

long *ret;

void thefunc(int i){
   puts("func3");
   printf("i: %d\n",i);

   asm volatile("jmp rethere");
  }

int main(int argc, char *argv[] ){
  puts ("Hello..");

  thefunc(23);

  asm volatile ( "jmp thefunc;\n" );
  puts ("XXXX Noo.");
  asm volatile( "rethere:" );

  puts("ret after rethere");
}

this gives: Hello.. func3 i: 23 ret after rethere func3 i: 1 ret after rethere


So, essentially, it "returns" out of func to the label rethere, exits main, which in turn doesn't exit main and instead enters "thefunc" the second time; since the return address of thefunc hasn't been popped. This chapter of "How to confuse yourself" is starting to get fun.

michael105 commented 5 years ago

Having released a only trivially tested extraction of files, consisting of mainly a bundled minilib.h header, and the script mini-gcc, I'm going to sleep now. Have to do more severe testing tomorow. Won't tag this before.