michael105 / minilib

A c standard system library with a focus on size, headeronly, "singlefile", intended for static linking. 187 Bytes for "Hello World"(regular elf), compiled with the standard gcc toolchain.
Other
37 stars 1 forks source link
c clang embedded-c gcc libc minilib osx security static-library system-programming tiny

Index: README (this file) | link:doc/reference/readme.asc[Reference] | link:doc/minilib-reference.asc[Short Reference] | link:doc/minimake.rst[minimake] | link:doc/make.rst[makefile] | link:doc/about.asc[about] | link:doc/mlfunctions-shortref.asc[functions] | link:doc/minilib.md[functions shortref]

CodeQL(master branch): image:https://github.com/michael105/minilib/actions/workflows/codeql-build.yml/badge.svg?branch=master[Build]

image:https://github.com/michael105/minilib/actions/workflows/codeql-selftests.yml/badge.svg?branch=master[Selftests]


README

:hardbreaks:

TOC: xref:Summary[Summary] | xref:Security[Security] | xref:thread-safety[Thread safety] | xref:note---branches[Branches] | xref:about[About] | xref:build-dependencies[Dependencies] | xref:usage[Usage] | xref:bugs[Bugs] | xref:see-also[See also] | xref:notes[Notes]

minilib


 A c system library with a focus on size, 
headeronly, "singlefile", intended for static linking. 
187 Bytes for "Hello World", 
regular elf binary compiled with the standard gcc toolchain. 

Currently the latest bundled development version is stored at 
https://github.com/michael105/minilib/tree/download

====
 Copyright (c) 2012-2021 Michael (Misc) Myer, 
misc.myer@zoho.com / www.github.com/michael105
BSD-Style license, with attribution.
Static linking is permitted.
Commercial use is permitted.
Some sourcefiles, not part of milib itself, 
in the folder contrib do have other licensing terms.
"ports" contains other software, not written by me,
but ported to compile with minilib.
Please look in question into the sources. 
====

Summary

Minilib aims to be an incomplete libc replacement, single header only, as simple and small as possible, KISS, but supplying ANSI-C89, the most important tools of libc, and some POSIX/C11 addons (Which don't add bloat and are optional).

The build can be customized by define switches, compiled are only the explicitely enabled sources.

The main targets are simplicity and small memory footprint. Security is in mind as well. Having a small codebase might be of most importance in this context; the sources can be read through and checked. (As opposed to, e.g., glibc, where reading the sources is highly theoretical.)

In order to have all compiled sources combined and dumped to the terminal, type either SHOWSOURCES=1 make -f Makefile.minilib CONF=yourconfig.conf, or minimake --config yourconfig.conf --showsources.

Albite the header file is parsed and compiled for every source file, this takes only fractions of a second at my development system (amd Turion 2Ghz), at the same time giving full advantage of the possible linktime optimizations.


Security


Please be aware that the ANSI and POSIX standards are inherently
insecure. (e.g. scanf, gets, operations on memory in general, ..)
There is the possibility to use gets and other problematic functions with
guarded memory mappings (link:doc/reference/memory.asc#map_protected[map_protected]).

However, minilib is a system library, so it would imho be neither useful nor
wise trying to fix all 'security problems' by patching problematic functions, andsoon.
You've to take care yourself for handling e.g. user input in a secure manner.

For example, using gets for user input, worst case with a variable located
at the stack, is a bad idea. (Buffer overrun, and all sort of stack manipulations are possible).

Using gets for interprocess communication with pipes would be ok,
when the input can be guaranteed to stay within the specifications.
Same with, e.g., configuration files, which obviously shouldn't 
be possible to manipulate.

It cannot be within the targets of a system library trying to prevent developers
from doing anything. This will neccessarily be some sort of patchwork,
but you simply cannot patch all holes. Same time eating up resources, 
and limiting the freedom of the developers severely.

(However, I am about to put compiler warnings into minilib for problematic functions
like gets, and a switch to en- and disable the warnings)

And there is the possibilty to put the minilib internal buffer, and the globals,
on (after) the stack with the switch globals_on_stack.
Albite within, e.g., printf, there are tests for buffer overruns,
in some special cases, an overrun could still happen.
To prevent the possiblity of overwriting environmental variables, I'd suggest
reading and copying all environmental variables first, 
before processing any user input.

Anyways, it might be the best practice to avoid buffer overruns. 
By always checking, for example, user input, 
avoiding inherent insecure functions like sprintf, gets, or using these
only in guarded memory areas (map_protected).

Again, when in doubt, showing ALL compiled sources is possible with 
`minimake --config yourconfig.conf --showsources`.
`minimake --config yourconfig.conf -S` will save the generated assembly into 
'sourcefile.s', which still is readable.

Having readable source is IMHO the biggest advantage, especially for security related code.
For the same reason I started to put checksums into the generated configuration files;
to ensure, that some version (which might be already checked, not only by the
security scan of 'codeql', but as well by other hackers) stays the same.

I once tried to debug an internal glibc problem. It's been a nightmare.
And to be honest, there's not only the problem, that there could be something hidden
within the library, it is also known that sometimes flaws only show up
with e.g. a combination of gcc and glibc. Worse, there have been flaws
by code, which should ensure security - just to bring in new vulnerabilities
by the added complexity. (e.g. the stack protector/canaries)

Thread safety

Due to its structure, minilib is 'inherent' threadsafe. (have to document the different possiblities of using threads with minilib) As long you do not define globals, (which are only needed for iostream functions, printf, vsprintf, malloc, mallocbrk), all functions are (obviously) threadsafe.

When using globals, or the global buf, printf, vsprintf, malloc and mallocbrk are not thread safe, as well as operations on streams. (Might be obvious, as well)

There is the possibility to create a new thread with the flag CLONE_VM, what will make all functions "threadsafe". Except operations on streams. Memory is copy on write with CLONE_VM, there is more documentation at the according linux manpage about clone.


NOTE - Branches


____
minilib development is separated into several branches.

Within the branch download the assembled ("compiled") 'minilib.h',
a Makefile ('Makefile.minilib') for building with 'minilib.h',
and a script, with 'minilib.h' embedded, 'minimake' are stored.

When you'd like to build minilib itself, you'll 
need either the branch master or devel.

Within devel-HEAD there are also some ports to minilib, and 'minicore' stored,
these are however work in progress and mainly there for testing minilib.
Not all things stored in devel-HEAD are supposed to build correctly, 
or work the way intended; especially the "ports" are work in progress.

Linking and compiling with minilib is the same process with all branches.
____   

 About

Minilib evolved out of some assembler experiments, where I realized how tiny executables can be, if not linked with glibc. With some more fiddling I got a C "Hello World!" binary with 151 bytes, compiled and linked with the standard gcc toolchain. This is more than 3,000 times less than a hello world program linked statically with glibc, which shows up with 504kB on my system. (5.4kB linked dynamic)

(On OSX there are some limits, files with less than 4k aren't executed, so that's hello world's size there..)

minilib saves a lot of startup overhead, no linker needed to bind shared libraries, no lookup tables, less memory fraction; also (small) binaries might often fit into a single harddisk block; opposed to a binary and several shared libraries.

This gives an overall more responsive system.

Just for fun, but this minilib also beats klibc, dietlib and musl in the matter of hello world's size..


(Linux 32bit:) musl: 7.6k dietlibc: 672 klibc: 367 minilib: 151

(On Linux 64bit I didn't check the other libraries, but the bigger registers cost a few bytes: hello world counts here 185 Bytes.)

Being in syscalls, linker scripts and gcc optimization hell, I realized, that after I got over the fundamental problems adding more standard c routines would be cheap. So I started to port some more little c tools to minilib, adding more functionality to minilib as well.

Since I hate todays usual overhead in modern libraries, resulting in the unbelievable size of hundreds of Megabytes for simple applications as well as minutes for just starting them up, I also tried to optimize functions like printf where sensible.

The focus is on size, however. You are able to customize the compilation of the library and define only the functions you need; if you are in need of, e.g., a more speed optimized strcmp, just don't define the according switch and rule your own.

Albite the header file is parsed and compiled for every source file, this takes only fractions of a second at my development system. (amd Turion 2Ghz)


~/prog/minilib $ time ./minimake --ldscript onlytext -o hello hello.c ./minimake -o hello hello.c 0.06s user 0.05s system 92% cpu 0.116 total ~/prog/minilib $ ./tools/shrinkelf hello stripping: hello ~/prog/minilib $ ls -l hello -rwxr-xr-x 1 micha micha 185 Aug 15 16:16 hello

This whole project exaggerated out of fun and passion.

Build dependencies


* Needs:
  gcc / clang
  gcc 9.3 works, 10.x seems to sometimes build segfaulting binaries
     (I developed yet mainly with gcc 9.3)
  clang 11 seems to work. 12 untested.

* Building with Makefile.minilib:
  sed, gzip, curl or wget for automatically downloading the gzipped minilib

* When using minimake:
  bash, sed, gzip, sort

* Rebuild minilib:
  make, perl

* generate documentation:
  perl, asciidoc, rst2html5, markdown

Usage

Atm, it might be easiest to have a look into the directory examples, and the Makefile as well as the sources of the different versions of "hello world" there, as well as "QUICKSTART".

There are several possibilities to use minilib.

Easiest route: use the supplied script minimake, with the argument --autoconf.

minimake --autoconf -o hello hello.c

(or, for Makefiles utilizing the CC environmental setting:))

export CC=minimake --autoconf

This is save only for projects, consisting of single source files.

minimake will create a minilib config within 'minilib.conf', parse this config and pipe the resulting define flags to gcc.

The source file is parsed and every function used enabled in the minilib config. However, e.g., conversions used within fprintf aren't recognized, and therefore not going to be compiled. (itodec, atodec, for example)

So, in most cases it might be better to create a config file, edit the config and then use this config file:

Generate a skeleton file, and try to recognize used functions: minimake --genconf hello.mconf hello.c

Edit the config: vi hello.mconf (the config file is self documented)

Finally use the config for the compilation: minimake --config hello.mconf -o hello hello.c

In both cases, it is not neccessarily needed to modify the source files. "minilib.h", and the config switches are piped to gcc; the compile and linker flags are passed from minimake. The standard libc headers are prevented from being included by defining the according headerguards.

When compiling without --autoconf or --config, you have to include minilib.h in all sourcefiles, needing definitions from minilib.

You are able to define functions you need yourself, before you include minilib.h. Example: #define mini_write (Or) use the config framework, implemented within the script minimake.

The config framework has the big advantage of raising syntax errors for e.g. mistypes, so instead of having to look for the mistype, it's shown up even before the compile.

In one of your sourcefiles you also have to define "#define mini_INCLUDESRC", before the "include "minilib.h" directive.

Either use the supplied Makefile.template by copying it; or set the environmental variable CC 'export "CC=[path to minilib]/minimake [--config configfile.conf]"'

There is further documentation in the chapter 'configuration' of the reference. link:doc/reference/index.asc[Reference]

Differences to posix/ansi


Please have look into link:DIFFERENCES.md(DIFFERENCES).

 State

Currently only Linux 64bit is supported, and gcc v9.3.

Other versions of gcc result sometimes in segfaulting binaries. Bugreports and patches, preferred as github issues, are very welcome.

clang 10/11 works mostly, but several combinations of functions and optimization flags give trouble. Please see the section BUGS.



Minilib aims to be an incomplete libc replacement, as simple and small as possible, KISS!!!, but supplying ANSI-C, the most important tools of libc, some addons (Which don't add bloat..). Some syscall wrappers.

One reason to use syscall wrappers is type safety. (The kernel doesn't know about type safety, he simply grabs the arguments he's fed with.) Another reason is the problematic of having thecompilers sometimes parts of the code optimized out, when working with inline assembly.

Minilib shall be portable. For now there is Linux x86_64, x86_32, and OSX Darwin. (targeted. The ports might not compile now, I cannot check atm, since my macbook is broken and other things are important to me as well.)

Android is targeted. BSD is targeted.

Minilib should stay tiny. Ideally minilib is going to supply glibc compatible headers. (sort of)

Warning


This library is still under heavy development.
Albite now (2021/06) settling. Ansi C89 is done to 98% 
(not counting localization and floating math)
Some of the most important functions of Posix C are implemented.

Testing in different environments is needed.

I developed this library all over all in more than 15 years,
and I'm using it myself in several projects.

However, I cannot give any guarantees for an opensource library.

 Development

There is some infomation on the development of minilib itself within link:doc/minilib-devel.textile[doc/minilib-devel.textile]

Adding more syscall wrappers is simple.

Please have a look at include/lseek.h for examples. As well as include/syscall_stubs.h. (The file with definitions of simple syscall wrappers)

For bugfixes, please open an issue at github. (github.com/michael105/minilib)

Bugs


CLANG: 
 - strdup doesn't work with optimize -Os
 - fork doesn't work with -Os,-O2
 - signal/abort doesn't work with -Os,-O2,-O1
 - prints (vararg) doesn't work with -O0 (!??), but O1,O2, Os are ok.

GCC:
 versions > 9.3 untested.

 Further Readings

-> https://elinux.org/System_Size - Embedded Linux and size

-> http://muppetlabs.com/~breadbox/software/tiny/teensy.html - About creating tiny binaries

-> https://olimex.wordpress.com/2012/04/04/unix-on-pic32-meet-retrobsd-for-duinomite/

-> https://www.mkompf.com/cplus/cliblist.html - Ansi standard c

-> https://www.etalabs.net/compare_libcs.html - comparison of c/POSIC standard library implementations

See Also


tiny c lib implementations

  klibc
  bionic
  musl
  uLibc
  dietlibc
  minibase
  embedded artistry libc

c standard libraries

  musl
  uClibc
  glibc

 Notes

"It's sometimes smarter to stay special, although in this case this means the opposite"

Regarding my (strangely simplicistic) malloc implementation. Which fit's quite well to the minilib in whole. Being smart with utf-anything, don't know which bordercase and so on - is sometimes overall not smart.

So, having a 500kB "hello world" like with glibc, is not soo smart. Especially when it needs to be compiled again at the target, so most bordercases aren't of any interest.

151 Bytes, as accomplished with minilib - There's still some overhead. But it's another dimension.

Addendum, for non tech's



Since I do need to present myself to non developers as well,
I'd like to say here a few sentences about minilib,
which might be understandable, also if you aren't engadged in 
system development.

Overall, minilib got a quite comprehensive project.
More than 30.000 lines of code,
when counting comments as well,
and only the folders src, include, asm and scripts.

This is quite a lot, pessimistic people do say,
a developer is able to write one line of code per day.
Well, me I'm doing more. What has, to be honest, to do 
with the fact, that I didn't need to exchange and communicate
with other developers. And it is a pessimistic view.

Anyways, if you would like an estimation of the development costs of 
something like minilib in money, the number might have 6 0's, in Dollars.
(Not accounting the fact, more than half of all software projects
do not finish..)
However, to say that again to non opensource developers;
it would not be possible to sell this software for that amount of money.
There are already several other opensource projects with the same target,
and the whole software landscape did change the last decades.
Besides the fact, this project would need to be different, when
having a commercial focus. 
So the value on the market might be 0,
noone would pay money for something, 
what is neither finished, nor stable,
and available in other (stable and mature) implementations for free
as well. 
It's comparable to my academical "book",
written for the dregree "Magister Artium" in philosophy.
Would be quite expensive to write as an order.
But nearly impossible to sell.
I am however looking for sponsors, or partners,
since it "costs" me more and more to engage myself in the development of minilib.
What on the other hand means some sort of compassion and contemplation to me.

Essentially, the basic libc is half of an operating system.
(Again, not nowadays, all the different hardware has to be accounted for and
needs different drivers, but, as I said, 
I'm just trying to make the project understandable)

I have written more than 95% of the sources on my own.
(I know. It is a passion of mine..)
And it is not my first bigger project, this passion accompanies me 
for more than 25 years. .. Some people to build mini railways,
me, I'm building software. 

A few ideas and solutions are genuin.
E.g., the config system, the "regular expression" engine, 
or some parts of howto fit this all together.
But most is just focussed on being of small codesize.
However, in turn, having the possibilities to fine grain the configuration
of the builds, and having executables with less than 200 Bytes 
with the standard gcc toolchain didn't exist yet.

I didn't do this "just for fun".(Quoting Torvalds)
I did have fun, but it is also a reference, and I did learn some things
about system development and executables in a deep fashion.
(I did some bigger projects before, also commercially,
but this has been more than 15 years ago)

By the way.. My Macbook is broken. :(
If you'd like to sponsor minilib, 
a Mackbook (also a used one) would be one of the things 
I'm in need of, and help me alot. :)

If you have some other hardware, you'd like to see minilib running at, 
just contact me. Most of minilib is portable down to 8 bits.
Something like an arduino, or a banana pi are on my wishlist, too.

I do not know, where this whole project is going to go.
I'm already about to rewrite the basic system tools, 
and trying to get a usable "unix" system within 64kB.
(without the kernel) 
This is, again, "outstanding". Other projects
with the same focus do have around 1MB.
But again, it is only possible, since I do strip
the functionality down to the real and mostly used basics.

I do have ideas about a linux distribution, or something like that.
But. Who knows. I do have other obligations and ideas as well.

----------------------------------------------------
    Michael (Misc) Myer, 2012-2021
    misc.myer@zoho.com www.github.com/michael105
    (my "programming enthusiastic persona"..)
----------------------------------------------------

-------------------------------------------------
 Copyright (c) 2012-2021, Michael (misc) Myer 
 (misc.myer@zoho.com, www.github.com/michael105)
 Donations welcome.
 All rights reserved.
-------------------------------------------------

I did this minilib out of enthusiasm.
One side, I`d like to have this project being opensource.
On the other hand, I spent really much time with the development.
In an ideal world, someone would sponsor me for this time.
It is, however, not an ideal world we are living in.
Therefore I do have to insist on attributions and what
I`d like to call `fair use`.

In short, you are allowed to do with this project what you wish,
also commercially, as long you show the copyright notice at a prominent place, 
and include the file NOTICE in distributions.

That`s the intention of the license,
which I`d like to name "Fair use with attribution".

Please have a look into the file LICENSE for the exact licensing terms.

Disclaimer:

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL Michael Myer BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Please contact me, if you are in need of differing licensing terms.

Some files in the folders headers and contrib do have other licensing terms.
Please look in question into the sources.

"ports" contains other software, not written by me,
but ported to compile with minilib.