topepo / C5.0

An R package for fitting Quinlan's C5.0 classification model
https://topepo.github.io/C5.0/
50 stars 20 forks source link

Fix undefined null pointer offset #33

Closed DavisVaughan closed 3 years ago

DavisVaughan commented 3 years ago

Running this line https://github.com/topepo/C5.0/blob/ae16ae042c6ba5e30c8a648cf62dd9642844c49f/R/C5.0.R#L110 triggers the runtime error below in CRAN's clang USBAN build.

I have reproduced this locally using the RDcsan version of R compiled with clang USBAN (https://github.com/wch/r-debug). This PR fixes the issue, and I have confirmed with RDcsan that that line no longer throws the runtime error. A devtools::check() passes cleanly as well. Notably, the gcc USBAN build that comes with rhub does not show this runtime error.

The problem was that right here: https://github.com/topepo/C5.0/blob/ae16ae042c6ba5e30c8a648cf62dd9642844c49f/src/siftrules.c#L181 The code was assigning 0 to a pointer (CovByPtr is an array of pointers, so the value at CovByPtr[0] is another pointer). The code is trying to be a bit clever, and at this particular point it is attempting to treat the array of pointers as an array of size_t, accumulating cumulative sizes in the loop following that line. This is strange, but would generally be okay, except for the fact that assigning 0 to a pointer is actually identical to assigning NULL (see https://stackoverflow.com/questions/48288191/assigning-0-vs-null-to-a-pointer-in-c). In the first iteration of the loop, by replacing CovByPtr[i - 1] with NULL, we see that the code was doing NULL + CovBy[i - 1] + Extra, and adding an offset to a NULL pointer is undefined behavior. On most compilers, the NULL pointer will actually be cast to 0 here, which is why it worked at all. However, the clang USBAN build picked up that this was an issue.

The fix is to have a first pass that accumulates the cumulative size that is required to allocate CovByBlock, and then have a second loop that assigns to CovByPtr all the pointers that point into CovByBlock. This ensures that CovByPtr is always treated as an array of pointers.

> ruleModel <- C5.0(churn ~ ., data = mlc_churn[1:3333, ], rules = TRUE)
siftrules.c:183:35: runtime error: applying non-zero offset 4 to null pointer
    #0 0x7f98e53ff3d9 in InvertFires /C5.0/src/siftrules.c:183:35
    #1 0x7f98e53fc3aa in SiftRules /C5.0/src/siftrules.c:79:3
    #2 0x7f98e5346546 in FormRules /C5.0/src/formrules.c:136:3
    #3 0x7f98e531df69 in ConstructClassifiers /C5.0/src/construct.c:172:24
    #4 0x7f98e53e1b09 in c50main /C5.0/src/rc50.c:143:5
    #5 0x7f98e5430fb8 in c50 /C5.0/src/top.c:84:5
    #6 0x7f98f8272da9 in do_dotCode /tmp/r-source/src/main/dotcode.c:1874:2
    #7 0x7f98f83d08b9 in Rf_eval /tmp/r-source/src/main/eval.c:830:9
    #8 0x7f98f84bcb46 in do_set /tmp/r-source/src/main/eval.c:2969:8
    #9 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #10 0x7f98f84baf8b in do_begin /tmp/r-source/src/main/eval.c:2517:10
    #11 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #12 0x7f98f84af4b6 in R_execClosure /tmp/r-source/src/main/eval.c:1897:22
    #13 0x7f98f84acd58 in Rf_applyClosure /tmp/r-source/src/main/eval.c:1823:16
    #14 0x7f98f83d0e36 in Rf_eval /tmp/r-source/src/main/eval.c:850:12
    #15 0x7f98f84bcb46 in do_set /tmp/r-source/src/main/eval.c:2969:8
    #16 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #17 0x7f98f84baf8b in do_begin /tmp/r-source/src/main/eval.c:2517:10
    #18 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #19 0x7f98f84af4b6 in R_execClosure /tmp/r-source/src/main/eval.c:1897:22
    #20 0x7f98f84acd58 in Rf_applyClosure /tmp/r-source/src/main/eval.c:1823:16
    #21 0x7f98f866fc1d in applyMethod /tmp/r-source/src/main/objects.c:118:8
    #22 0x7f98f8667492 in dispatchMethod /tmp/r-source/src/main/objects.c:436:16
    #23 0x7f98f86661ef in Rf_usemethod /tmp/r-source/src/main/objects.c:476:10
    #24 0x7f98f866839f in do_usemethod /tmp/r-source/src/main/objects.c:565:9
    #25 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #26 0x7f98f84af4b6 in R_execClosure /tmp/r-source/src/main/eval.c:1897:22
    #27 0x7f98f84acd58 in Rf_applyClosure /tmp/r-source/src/main/eval.c:1823:16
    #28 0x7f98f83d0e36 in Rf_eval /tmp/r-source/src/main/eval.c:850:12
    #29 0x7f98f84bcb46 in do_set /tmp/r-source/src/main/eval.c:2969:8
    #30 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #31 0x7f98f85b7ef5 in Rf_ReplIteration /tmp/r-source/src/main/main.c:264:2
    #32 0x7f98f85bd443 in R_ReplConsole /tmp/r-source/src/main/main.c:314:11
    #33 0x7f98f85bcffc in run_Rmainloop /tmp/r-source/src/main/main.c:1117:5
    #34 0x7f98f85bd5cd in Rf_mainloop /tmp/r-source/src/main/main.c:1124:5
    #35 0x4c894d in main /tmp/r-source/src/main/Rmain.c:29:5
    #36 0x7f98f738e0b2 in __libc_start_main (/usr/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #37 0x41c2dd in _start (/usr/local/RDcsan/lib/R/bin/exec/R+0x41c2dd)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior siftrules.c:183:35 in 
siftrules.c:187:40: runtime error: applying non-zero offset 108576775406080 to null pointer
    #0 0x7f98e53ff938 in InvertFires /C5.0/src/siftrules.c:187:40
    #1 0x7f98e53fc3aa in SiftRules /C5.0/src/siftrules.c:79:3
    #2 0x7f98e5346546 in FormRules /C5.0/src/formrules.c:136:3
    #3 0x7f98e531df69 in ConstructClassifiers /C5.0/src/construct.c:172:24
    #4 0x7f98e53e1b09 in c50main /C5.0/src/rc50.c:143:5
    #5 0x7f98e5430fb8 in c50 /C5.0/src/top.c:84:5
    #6 0x7f98f8272da9 in do_dotCode /tmp/r-source/src/main/dotcode.c:1874:2
    #7 0x7f98f83d08b9 in Rf_eval /tmp/r-source/src/main/eval.c:830:9
    #8 0x7f98f84bcb46 in do_set /tmp/r-source/src/main/eval.c:2969:8
    #9 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #10 0x7f98f84baf8b in do_begin /tmp/r-source/src/main/eval.c:2517:10
    #11 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #12 0x7f98f84af4b6 in R_execClosure /tmp/r-source/src/main/eval.c:1897:22
    #13 0x7f98f84acd58 in Rf_applyClosure /tmp/r-source/src/main/eval.c:1823:16
    #14 0x7f98f83d0e36 in Rf_eval /tmp/r-source/src/main/eval.c:850:12
    #15 0x7f98f84bcb46 in do_set /tmp/r-source/src/main/eval.c:2969:8
    #16 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #17 0x7f98f84baf8b in do_begin /tmp/r-source/src/main/eval.c:2517:10
    #18 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #19 0x7f98f84af4b6 in R_execClosure /tmp/r-source/src/main/eval.c:1897:22
    #20 0x7f98f84acd58 in Rf_applyClosure /tmp/r-source/src/main/eval.c:1823:16
    #21 0x7f98f866fc1d in applyMethod /tmp/r-source/src/main/objects.c:118:8
    #22 0x7f98f8667492 in dispatchMethod /tmp/r-source/src/main/objects.c:436:16
    #23 0x7f98f86661ef in Rf_usemethod /tmp/r-source/src/main/objects.c:476:10
    #24 0x7f98f866839f in do_usemethod /tmp/r-source/src/main/objects.c:565:9
    #25 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #26 0x7f98f84af4b6 in R_execClosure /tmp/r-source/src/main/eval.c:1897:22
    #27 0x7f98f84acd58 in Rf_applyClosure /tmp/r-source/src/main/eval.c:1823:16
    #28 0x7f98f83d0e36 in Rf_eval /tmp/r-source/src/main/eval.c:850:12
    #29 0x7f98f84bcb46 in do_set /tmp/r-source/src/main/eval.c:2969:8
    #30 0x7f98f83cfc8c in Rf_eval /tmp/r-source/src/main/eval.c:802:12
    #31 0x7f98f85b7ef5 in Rf_ReplIteration /tmp/r-source/src/main/main.c:264:2
    #32 0x7f98f85bd443 in R_ReplConsole /tmp/r-source/src/main/main.c:314:11
    #33 0x7f98f85bcffc in run_Rmainloop /tmp/r-source/src/main/main.c:1117:5
    #34 0x7f98f85bd5cd in Rf_mainloop /tmp/r-source/src/main/main.c:1124:5
    #35 0x4c894d in main /tmp/r-source/src/main/Rmain.c:29:5
    #36 0x7f98f738e0b2 in __libc_start_main (/usr/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #37 0x41c2dd in _start (/usr/local/RDcsan/lib/R/bin/exec/R+0x41c2dd)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior siftrules.c:187:40 in 
topepo commented 3 years ago

Thanks for figuring this out 💯