pantoniou / libfyaml

Fully feature complete YAML parser and emitter, supporting the latest YAML spec and passing the full YAML testsuite.
MIT License
243 stars 74 forks source link

using enums in bitfields is not portable, causes crash on NEC Aurora #102

Closed rwirth closed 3 months ago

rwirth commented 10 months ago

The C11 standard states [N1548, 6.7.2.1(5)]

A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type. It is implementation-defined whether atomic types are permitted.

The fy_node struct from fy-doc.c has a member enum fy_node_type type : 2; that is used to store values 0, 1, or 2. The use of the enum as a bit-field is supported by GCC, which uses unsigned int as the underlying type because the enum constants are nonnegative. Both are implementation-defined behaviors.

The compiler for the NEC Aurora architecture of vector processors allows the use of enums as bit-fields as well, but the underlying type is always signed int. Here we run into trouble because signed int : 2 can only store values from -2 to +1. Writing the value 2 (FYNT_MAPPING) overflows and causes undefined behavior. For the switch(fyn->type) inf fy_node_alloc, the compiler emits assembly that extracts the value just stored to the lowest two bits of the 32-bit integer representing type and the following bools by performing an arithmetic left shift by 30 bits (in an effectively 32-bit register) followed by an arithmetic right shift by the same amount. After the left shift, the highest bit is set and the arithmetic right shift copies that bit to all the bits that are shifted back in. This is fine because extracting a value from a signed bit-field should preserve the sign.

Thus, we end up with 0xFFFFFFFE ~= -2, which is compared against 0, 1, and 2 to determine the branch of the switch. Without a match, it drops through and returns a partially initialized object with NULL pointers in the mapping linked list head. Those NULL pointers crash the program once the list gets expanded.

Replacing the enum bit-field by an unsigned int bit-field should help.

rwirth commented 10 months ago

This patch against libfyaml 0.8 fixes the issue:

diff -ru a/src/lib/fy-atom.h b/src/lib/fy-atom.h
--- a/src/lib/fy-atom.h 2023-02-05 14:53:39.000000000 +0000
+++ b/src/lib/fy-atom.h 2023-12-07 13:50:39.000000000 +0000
@@ -66,11 +66,11 @@
        uint64_t tozero;            /* fast way to zero everything here */
        struct {
            /* save a little bit of space with bitfields */
-           enum fy_atom_style style : 8;   /* note that it's a big perf win for bytes */
-           enum fy_atom_chomp chomp : 8;
+           unsigned int style : 8; /* enum fy_atom_style, note that it's a big perf win for bytes */
+           unsigned int chomp : 8; /* enum fy_atom_chomp */
            unsigned int tabsize : 8;
-           enum fy_lb_mode lb_mode : 1;
-           enum fy_flow_ws_mode fws_mode : 1;
+           unsigned int lb_mode : 1; /* enum fy_lb_mode */
+           unsigned int fws_mode : 1; /* enum fy_flow_ws_mode */
            bool direct_output : 1;     /* can directly output */
            bool storage_hint_valid : 1;
            bool empty : 1;         /* atom contains whitespace and linebreaks only if length > 0 */
diff -ru a/src/lib/fy-doc.h b/src/lib/fy-doc.h
--- a/src/lib/fy-doc.h  2022-01-13 16:22:10.000000000 +0000
+++ b/src/lib/fy-doc.h  2023-12-07 13:48:35.000000000 +0000
@@ -61,7 +61,7 @@
    struct fy_node *parent;
    struct fy_document *fyd;
    unsigned int marks;
-   enum fy_node_type type : 2; /* 2 bits are enough for 3 types */
+   unsigned int type : 2;  /* enum fy_node_type: 2 bits are enough for 3 types */
    bool has_meta : 1;
    bool attached : 1;      /* when it's attached somewhere */
    bool synthetic : 1;     /* node has been modified programmaticaly */
pantoniou commented 3 months ago

Thanks, patch applied on 23265a8a5aaa714a462533bd4833a86e5984c2a0