universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.47k stars 620 forks source link

Unaligned access UB in Ruby parser #3881

Closed matoro closed 8 months ago

matoro commented 9 months ago

Hi, it would seem that there is an unaligned access issue in the Ruby parser. This is undefined behavior on all platforms, and is completely disallowed (crashes with SIGBUS) on sparc. I used this fact to obtain the backtrace below.

This affects all of the Ruby tests so I just used the simplest one to reproduce it:

./ctags --verbose --options=NONE --fields=-T --optlib-dir=+./Units/parser-ruby.r/simple.rb.d/optlib -o - --options=./Units/parser-ruby.r/simple.rb.d/args.ctags ./Units/parser-ruby.r/simple.rb.d/input.rb

GCC UBSAN at least does catch this (output truncated for brevity):

Reading initial options from command line
Entering configuration stage: loading command line
  Option: --fields=-T
enable field "epoch": no
  Option: --optlib-dir=+./Units/parser-ruby.r/simple.rb.d/optlib
Prepend ./Units/parser-ruby.r/simple.rb.d/optlib to OptlibPathList
  Option: -o -
  Option: --options=./Units/parser-ruby.r/simple.rb.d/args.ctags
Considering option file ./Units/parser-ruby.r/simple.rb.d/args.ctags: reading...
  Option: --fields=+{line}{end}
enable field "line": yes
enable field "end": yes
Reading command line arguments
Get file language for ./Units/parser-ruby.r/simple.rb.d/input.rb
        pattern: input.rb
                #candidates: 1
                        0: Ruby (extension: "rb")
                #candidates after sorting and filtering: 1
                        0: Ruby (extension: "rb")
OPENING ./Units/parser-ruby.r/simple.rb.d/input.rb as Ruby language file [new]
Initialize parser: Ruby
Initialize parser: RSpec
parsers/ruby.c:746:11: runtime error: member access within misaligned address 0x00015472f23c for type 'struct blockData', which requires 8 byte alignment
0x00015472f23c: note: pointer points here
  05 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
parsers/ruby.c:755:11: runtime error: member access within misaligned address 0x00015472f23c for type 'struct blockData', which requires 8 byte alignment
0x00015472f23c: note: pointer points here
  05 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
parsers/ruby.c:764:11: runtime error: member access within misaligned address 0x00015472f23c for type 'struct blockData', which requires 8 byte alignment
0x00015472f23c: note: pointer points here
  05 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 

Since this crashes on sparc, I used this to catch the backtrace on exactly where it dies:

Reading initial options from command line
Entering configuration stage: loading command line
  Option: --fields=-T
enable field "epoch": no
  Option: --optlib-dir=+./Units/parser-ruby.r/simple.rb.d/optlib
Prepend ./Units/parser-ruby.r/simple.rb.d/optlib to OptlibPathList
  Option: -o -
  Option: --options=./Units/parser-ruby.r/simple.rb.d/args.ctags
Considering option file ./Units/parser-ruby.r/simple.rb.d/args.ctags: reading...
  Option: --fields=+{line}{end}
enable field "line": yes
enable field "end": yes
Reading command line arguments
Get file language for ./Units/parser-ruby.r/simple.rb.d/input.rb
        pattern: input.rb
                #candidates: 1
                        0: Ruby (extension: "rb")
                #candidates after sorting and filtering: 1
                        0: Ruby (extension: "rb")
OPENING ./Units/parser-ruby.r/simple.rb.d/input.rb as Ruby language file [new]
Initialize parser: Ruby
Initialize parser: RSpec

Program received signal SIGBUS, Bus error.
0x00000100000f4e00 in deleteBlockData (nl=0x100004af228, data=0x0) at parsers/ruby.c:746
746                     && bdata->mixin != NULL
(gdb) p bdata
$1 = (struct blockData *) 0x100004af22c
(gdb) bt full
#0  0x00000100000f4e00 in deleteBlockData (nl=0x100004af228, data=0x0) at parsers/ruby.c:746
        bdata = 0x100004af22c
        e = 0x73
        sub_e = 0x100004af25b
#1  0x000001000020bc74 in nestingLevelsPopFull (nls=0x100004ae4e0, ctxData=0x0) at main/nestlevel.c:98
        nl = 0x100004af228
#2  0x00000100000f6884 in findRubyTags () at parsers/ruby.c:1229
        subparser = 0x0
        cp = 0x100004af25b ""
        expect_separator = false
        line = 0x100004af250 "        end"
        inMultiLineComment = false
        constant = 0x100004ae4c0
        found_rdoc = false
#3  0x0000010000041f3c in createTagsForFile (language=108, passCount=1) at main/parse.c:4052
        lang = 0x10000468e20
        rescan = RESCAN_NONE
#4  0x00000100000427ac in createTagsWithFallback1 (language=108, exclusive_subparser=0x7feffffe734) at main/parse.c:4179
        tagFileResized = false
        numTags = 0
        tagfpos = {type = MIO_TYPE_FILE, impl = {file = {__pos = 0, __state = {__count = 256, __value = {__wch = 2322720, __wchb = "\000#q "}}}, mem = 0}}
        lastPromise = -1
        passCount = 1
        whyRescan = (unknown: 0x237158)
        parser = 0x100004261d0
        corkFlags = 1
        useCork = true
#5  0x0000010000042a80 in createTagsWithFallback (fileName=0x100004a9dc0 "./Units/parser-ruby.r/simple.rb.d/input.rb", language=108, mio=0x0, mtime=0, failureInOpenning=0x7feffffe816) at main/parse.c:4271
        exclusive_subparser = -2
        tagFileResized = false
#6  0x000001000004347c in parseMio (fileName=0x100004a9dc0 "./Units/parser-ruby.r/simple.rb.d/input.rb", language=108, mio=0x0, mtime=0, useSourceFileTagPath=true, clientData=0x0) at main/parse.c:4433
        tagFileResized = false
        failureInOpenning = false
#7  0x0000010000043710 in parseFileWithMio (fileName=0x100004a9dc0 "./Units/parser-ruby.r/simple.rb.d/input.rb", mio=0x0, clientData=0x0) at main/parse.c:4480
        tagFileResized = false
        language = 108
        req = {type = GLR_OPEN, fileName = 0x100004a9dc0 "./Units/parser-ruby.r/simple.rb.d/input.rb", mio = 0x0, mtime = 0}
#8  0x00000100000433e0 in parseFile (fileName=0x100004a9dc0 "./Units/parser-ruby.r/simple.rb.d/input.rb") at main/parse.c:4416
        bRet = 226
#9  0x0000010000028f98 in createTagsForEntry (entryName=0x100004a9dc0 "./Units/parser-ruby.r/simple.rb.d/input.rb") at main/main.c:221
        resize = false
        status = 0x10000420cd8 <file>
#10 0x0000010000028ff0 in createTagsForArgs (args=0x100004238d0) at main/main.c:266
        arg = 0x100004a9dc0 "./Units/parser-ruby.r/simple.rb.d/input.rb"
        resize = false
#11 0x0000010000029610 in batchMakeTags (args=0x100004238d0, user=0x0) at main/main.c:360
        timeStamps = {0, 1099513937480, 8791798050008}
        resize = false
        files = true
#12 0x00000100000293dc in runMainLoop (args=0x100004238d0) at main/main.c:332
No locals.
#13 0x0000010000029d10 in ctags_cli_main (argc=9, argv=0x7fefffff2e0) at main/main.c:586
        args = 0x100004238d0
#14 0x0000010000028a28 in main (argc=9, argv=0x7fefffff2d8) at main/cmd.c:21
No locals.

I wanted to try and track this down to bisect it, but after checking commits all the way back to 2021 I did not find any point at which the issue did not occur.

Downstream bug: https://bugs.gentoo.org/920066

Let me know if there is any additional info I can provide to help.

b4n commented 9 months ago

There's a TODO in the code for this :)

I'm not sure how we should get the proper alignment requirements; but here's an attempt making a slightly less dump assumption (that is, that alignment for void* is OK for everything, rather than sizeof(int)), is this enough?

diff --git a/main/nestlevel.c b/main/nestlevel.c
index 3ce87df09..2387a6d8c 100644
--- a/main/nestlevel.c
+++ b/main/nestlevel.c
@@ -20,8 +20,22 @@

 #include <string.h>

-/* TODO: Alignment */
-#define NL_SIZE(nls) (sizeof(NestingLevel) + (nls)->userDataSize)
+/* This is a dummy struct for guessing the alignment for the user data */
+typedef struct
+{
+   struct NestingLevel nl;  /* not a pointer */
+   /* We make the assumption that void* alignment requirements are OK for
+    * everything else, which is not guaranteed (is it?), but usually are.
+    * TODO: review this alignment.  GObject uses (2*sizeof(size_t)) alignment,
+    * saying it borrows it from glibc, but the implementation is a bit more
+    * generic than this, so I'm not sure it's actually better */
+   void *ptr;
+} NestingLevelAlignmentHelper;
+
+
+/* account for the user data alignment if we have user data, otherwise allocate
+ * exactly what's needed not to waste memory for unneeded alignment */
+#define NL_SIZE(nls) ((nls)->userDataSize ? (offsetof(NestingLevelAlignmentHelper, ptr) + (nls)->userDataSize) : sizeof(NestingLevel))
 #define NL_NTH(nls,n) (NestingLevel *)(((char *)((nls)->levels)) + ((n) * NL_SIZE (nls)))

 /*
@@ -73,7 +87,7 @@ extern NestingLevel * nestingLevelsPush(NestingLevels *nls, int corkIndex)

    nl->corkIndex = corkIndex;
    if (nls->userDataSize > 0)
-       memset (nl->userData, 0, nls->userDataSize);
+       memset (nestingLevelGetUserData (nl), 0, nls->userDataSize);

    return nl;
 }
@@ -117,5 +131,7 @@ extern NestingLevel *nestingLevelsGetNthParent (const NestingLevels *nls, int n)

 extern void *nestingLevelGetUserData (const NestingLevel *nl)
 {
-   return (void *)nl->userData;
+   /* user data (if any) is allocated after the nest level itself, possibly with
+    * padding for alignment. */
+   return (void *)(((char *) nl) + offsetof(NestingLevelAlignmentHelper, ptr));
 }
diff --git a/main/nestlevel.h b/main/nestlevel.h
index c70c5e7e4..9e77b5185 100644
--- a/main/nestlevel.h
+++ b/main/nestlevel.h
@@ -26,7 +26,8 @@ typedef struct NestingLevels NestingLevels;
 struct NestingLevel
 {
    int corkIndex;
-   char userData [];
+   /* user data is allocated at the end of this struct (possibly with some
+    * offset for alignment), get it with nestingLevelGetUserData() */
 };

 struct NestingLevels

A simple more correct implementation would be to actually have a pointer and allocate memory separately for it, so we just have to assume malloc() can allocate anything; but it's a tad bad for fragmentation. Yet, it's better than UB I guess…

[edit] Although, I'm just realizing that this might cause issues as well if the size of the user data is not a multiple of a valid alignment for int itself, in which case we'd have misaligned access to the second nestlevel onward… more alignment required.

masatake commented 9 months ago

Reproduced with:

# yum install libubsan
$ make clean
$ make CFLAGS='-g -O0 -fsanitize=undefined -fno-sanitize-recover=all' LDFLAGS='-fsanitize=undefined -fno-sanitize-recover=all'
$ ./ctags --verbose --options=NONE --fields=-T --optlib-dir=+./Units/parser-ruby.r/simple.rb.d/optlib -o - --options=./Units/parser-ruby.r/simple.rb.d/args.ctags ./Units/parser-ruby.r/simple.rb.d/input.rb
b4n commented 9 months ago

@masatake thanks for the easy reproduction scenario. So, I do reproduce without my patch, and my patch makes it happy. x86_64 here, though.

masatake commented 9 months ago

@b4n Could you make a pull request for the patch? I will merge it. So @matoro can try the change quickly.

Thank you, great hackers.

b4n commented 9 months ago

@masatake will do, I'm just checking whether this is reasonable enough or if I need something a bit better (as mentioned in my edit, this could possibly cause issue if the result overall size doesn't align to something OK again)

matoro commented 9 months ago

Thanks, that gets everything passing here! I can test arbitrary patches, no need for a PR. Let me know if you want me to test a v2.

b4n commented 9 months ago

See #3883. Indeed I had to improve alignment logic, because otherwise it one passed a weird user data size value (e.g. 25), it'd totally mess up the alignment of the second nesting levels onward (until the alignment magically added back to something OK). This wastes more memory for padding, but it's reasonably easy to implement, and I'm guessing that it's OK for this API to be a little less memory-optimized. See the commit message of #3883 for more info, and possible shrinkage if we really want to.