Open ronsaldo opened 5 years ago
We chose using the Gtk naming convention ( https://github.com/GNOME/gtk/blob/master/docs/CODING-STYLE ), with the BSD/allman indentation style. The prefix selected for structs in the public API is VM, and for functions is vm_.
GitHubMulti-platform toolkit. Contribute to GNOME/gtk development by creating an account on GitHub.
Open source projects tipically follow a set of coding standard guidelines the involves cosmetical and technical guidelines that must be followed. The current VM does not have a proper coding standard, so we need to decided on something to adopt that we and new contributors must follow. Since fixing the codebase to follow a proper coding standard may involve a lot of work, the coding standard resulting from this discussion must be adopted at least for all the new code that we write, and for all the new public C interfaces that we expose to the public. The existent
For starting this discussion, here I am proposing a coding standard document with some specific details that needs to be discussed. The result of this discussion should be hosted at least in the GitHub wiki of this project. Unfortunately, GitHub does not support pull request of wiki pages, so we have to conduct this discussion manually.
Proposal wiki repo link: https://github.com/ronsaldo/opensmalltalk-vm.wiki.git
VM Coding Style Guidelines
We need to define a coding convention standard for the new VM code. The results of this discussion should be located on the wiki or in a new version of the CONTRIBUTING.md document.
Conding style guidelines include cosmetic, functional and quality aspects. From a quality standpoint what is really important on this aspect is to be as consistent as possible on the codebase (i.e. use the same coding style everywhere).
I suggest keeping the following original guidelines from the original VM source code, but this is clearly not enough:
Original VM coding guidelines from CONTRIBUTING.md
C Source Code Formatting
When editing an existing file, please be polite and first follow the guidelines below, and secondly follow the general rule of keeping to the same formatting conventions as exist in the file.
C Function Declarations
C function declarations should have the type on the same line as the function name and parameters, e.g.:
C Function Definitions
C function definitions should have the type on one line and the function name and parameters on the following line, e.g.:
This facilitates searching for function definitions by searching for the name at the beginning of the line.
Tab stops are 4 spaces
In this case the first rule is to keep the existing formatting within a file - we don't want to polute the code history with space changes.
The goal is to use tabs set to 4 spaces.
Additional coding guidelines
Checking existing coding guidelines
The following are some popular coding style that used in the industry and open source/free software projects:
We should take and adapt some of the guidelines that are used by these coding standards.
Indentation style
A coding standard is not the same as an indentation and formatting standard. A coding standard may have some .
For a list of indentation styles check: https://en.wikipedia.org/wiki/Indentation_style . For my own projects I typically use the Allman "BSD style" from that list. Another popular indentation style is to use a variant from the Kernighan and Ritchie style from that list. The K&R styles are more similar to the indentation style that is typically used in Java.
An indentation style must be selected explicitly for the whole VM source code project (excluding third party libraries). The selection of this indentation style must be specified with the command line parameters that have to be passed to the indent Unix utility. New code should be written by following that indentation style, and existing code should be converted by running the indent utility on them. The indentation style typically can be specified on IDEs such as Eclipse, and text editors such as Emacs so that they automatically help on writing code following the indentation.
Specific guidelines
Single line comments
Single line comments using
//
were not a part of the C standard until C99. These comments are supported by the important compilers (gcc, clang, msvc). The advantage of using these comment is that they facilitate using/**/
for comment large blocks of code for debugging purposes. These same large blocks of code can also be commented by#if 0 ... #endif
. Since these comments are supported by the important compilers, then there is no reason for not using them.Include guards and #pragma once
The
#pragma once
preprocessor directive is not part of the C standard. However, mainstream C compilers such as gcc, clang and msvc do support this pragma. The C standard does also mandates that a#pragma
that is not understood by a compiler must be ignore. The intention of this pragma is to improve compilation speed, but because of its non-standard nature it should be used in conjunction with include guards, in the following way:Cast from void* to specific type
In C, casting implicitly from void* is an allowed operation, but this is not allowed in C++. For example, the following is valid code in C, but not in C++:
The following version of the same code is valid in C and in C++:
Since the second version is more explicit, I recommend that we use that version just for the sake of being more explicit.
Order of includes in headers
According to the Google C++ coding standard, header files should be self contained and they should compile by it self. To facilitate this, the order of the includes should be the following:
The intention of this ordering is to produce a compilation error if a project specific header is missing the inclusion of a dependency.
Standard headers that are always present
The C language has the following two compilation modes:
The following headers are mandated by the language standard to be provided by the compiler on free standing mode, so it can be assumed that they are always going to be present even if there is no operating system:
From this list, the most important are, and .
stddef defines ptrdiff_t, size_t, wchar_t, NULL and offsetof. stdarg
defines macros for implementing functions with variable arguments such as
printf. stdint defines integer types with fixed size such as int32_t and
uint32_t. Since these headers are standard and they have to be always
provided by the compiler, then there is no reason to not use them.
Clang and GCC go for full C11 compatibility. MSVC implements only the subsets of C99 and C11 that it also needs to implement a corresponding C++ standard (https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/).
Prefixes for namespacing
The lack of namespaces in C, and the usage of a single global symbol table (specially on Linux. DLLs on Windows have their own symbol table. OS X uses something similar to RTLD_DEEPBIND by default) has a huge potential for producing name clashes. New code for the VM should use a prefix for simulating namespaces in C which is a standard practice in C libraries. This prefix could be realized by separating with an underscore:
Or by juxtaposition by using CamelCase:
I suggest using the version with underscores because it allows to have multiple namespaces, and we could use a different one for VM internals that should not be exposed to the public:
We should decide what is this prefix going to be. The following are some suggestions:
Memory management hygiene
The explicit memory management in C is the source of several reliability and security issues. For this reason we need some guidelines when allocating memory, working with them, and disposing it.
alloca should not be used
alloca is a built-in primitive from the C compiler for allocating memory on the stack. The implementation of alloca is done by decrementing the stack pointer. On Linux only the memory for the stack of the main thread can dynamically grow. The memory for threads different to the main thread is allocated on thread creation time and it is fixed for the whole lifetime of the thread. Since alloca can be used with large sizes, there is no guarantee on having enough memory on the stack for fullfilling the alloca, there is a strong potential for provocating a stack underflow and an almost guaranteed segmentation fault or memory corruption. For this reason the usage alloca is typically forbidden by coding standards used in the industry.
In case where alloca is absolutely required, the size of the allocation should be really small (no more than 100s of bytes). A local array with the maximum size should always be preferred to alloca.
alloca should not be used for MAX_PATH for the same reason that a local variable should not be used for MAX_PATH. The two idioms are equivalent.
malloced memory should be always fully inititialized
malloc returns uninitialized memory which can contain random garbage, but in many cases it may look like zeroed memory. Some of this garbage may never be initialized explicitly and could be interpreted in different random ways that vary with the execution of the program. This usage of uninitialized memory may be very hard to replicate and to debug. In some cases this uninitialized memory may introduce subtle security bugs and leak sensitive information like in the case of the infamous hearbleed bug in openssl. For this reason it is recommended to do a memset after a malloc. For example:
calloc should preferred over malloc
calloc always zeroes the allocated memory by doing an implicit memset. For this reason, calloc should be preferred to a raw malloc.
Initialization of structures
Structure should be always fully initialized. Most of the time it is recommended to fully zero initialize structures by using the following idiom:
String handling safeness
The usage of null terminated string literals in C is one of its greatest design errors. This design has performance problems. Another problem with string handling is that the string functions provided by the standard C library are prone to produce buffer overflows (e.g: strcpy, strcat).
Common operations on strings require knowing the length of a string for allocating memory to hold the result, which requires doing a linear iteration per string just for asking for its size. The correct way of fixing this is to prefix a string with its size (a la Pascal), but this introduces a strong penalty on the code. Instead we are going to accept this performance penalty, but we still need to use functions that operates on strings and take care doing a correct allocation of the result buffer, and provide safeness, security and reliability correctness when manipulating string.
The basic operations that should be provided by this library are the followings:
String functions from the standard library that do not receive the size of the target should not be used (e.g: strcpy, strcat, sprintf). Instead versions, of these functions that receive the size of the target buffer must be used (strncat, snprintf). In the case of strncpy, it must be avoided because it is designed for fixed sized strings without a null terminator, and strncpy may not insert the required null terminator at the end location.
Pointers to constants
Functions that receive string arguments that are not mutated must have pointer to constant values type. These types can be written in the following two equivalent ways:
For reasons of consistency, only one of these two ways of writing this type must be used. The first way of defining this type is the most commonly used way, but the second way may be easier to understand, and some files are currently using it.
In C++ unlike C, there is no implicit cast from string literals to
char*
, so for example the following code is legal in C, but not in C++:Using const as much as possible it is recommended because allows using the compiler to detect some errors. In the case of functions that belong to public interface that is exposed to plugins or applications that are embedding the VM, these headers should be usable from C++ code.
Operating system naming convention
The following naming prefixes should be used when writing operating system specific code:
C programming language standards
For reference, here are the draft versions of the current standards of the C language:
These language standards should be used mostly as guidelines for separating platforms independent features from platform specific features. GCC and clang tend to follow the language standard closely, but MSVC tend to only implement the standard partially. C99 features tend to be more widely supported. From C11, the support for atomic operation seems to be the most relevant feature for the VM.