r-lib / ansistrings

Manipulation of ANSI colored strings
Other
9 stars 5 forks source link

Crash when ansi reset is used instead of balanced escapes. #14

Closed jimhester closed 6 years ago

jimhester commented 6 years ago
x <- "\033[31m\033[1mfoo bar\033[0m"
ansistrings::ansi_to_html(x, fullpage = F)
#> Process 78370 stopped
#> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x11eb80840)
#>     frame #0: 0x00000001021f2bd8 ansistrings.so`ansistrings_make_ansi_map1(str=0x00000001095524f8, re_match_start=0x000000010a995228, re_match_end=0x000000010a994590, re_match_match=0x0000000105f69cc8, re_start_start=0x000000010a994750, re_start_match=0x0000000105f69a40, re_end_match=0x0000000105f69968) at map.c:177
#>    174        int j;
#>    175        for (j = 0; j < activeptr; ) {
#>    176          int a = active[j];
#> -> 177          int m = mapmap[a];
#>    178          const char *startcode = CHAR(STRING_ELT(re_start_match, a));
#>    179          int closes = ansistrings_i_re_closes(startcode, endcode);
#>    180          if (closes) {

It looks to me like the active array is not being initialized properly in this case.

(lldb) p active[0]
(int) $0 = 84393080
(lldb) p active[1]
(int) $1 = 1
gaborcsardi commented 6 years ago

Yeah, it basically assumes the the input is coming from crayon, which is balanced, unless you use start and finish. But yeah, we do need a better parser.

brodieG commented 6 years ago

See also #6

jimhester commented 6 years ago

Doesn't look easily adaptable but https://github.com/theZiz/aha has a C implementation.

gaborcsardi commented 6 years ago

@jimhester It is easier to fix the current parser. Or write another one from scratch. IMO.

brodieG commented 6 years ago

FWIW I am in the process of doing just that with fansi. I'll let you know if I make progress there. Hoping to get something useful in the next few weeks.

krlmlr commented 6 years ago

@gaborcsardi: In the tibble case, input comes from crayon.

gaborcsardi commented 6 years ago

@krlmlr how so?

krlmlr commented 6 years ago

Nested styles. This is with f27619b, current master is broken in a different way, PR follows.

$ R --debugger valgrind -q -e 'g <- crayon::green; b <- crayon::blue; ansistrings::ansi_to_html(paste0(g("1"), g(b("1")), g(b(g("1"))), g(b(g(b("1"))))))'
==16146== Memcheck, a memory error detector
==16146== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16146== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==16146== Command: /usr/lib/R/bin/exec/R -q -e g~+~\<-~+~crayon::green;~+~b~+~\<-~+~crayon::blue;~+~ansistrings::ansi_to_html(paste0(g("1"),~+~g(b("1")),~+~g(b(g("1"))),~+~g(b(g(b("1"))))))
==16146== 
==16146== Conditional jump or move depends on uninitialised value(s)
==16146==    at 0x58051A7: __wcsnlen_avx2 (strlen-avx2.S:261)
==16146==    by 0x5739CAA: wcsrtombs (wcsrtombs.c:104)
==16146==    by 0x56C1740: wcstombs (wcstombs.c:34)
==16146==    by 0x504CF62: wcstombs (stdlib.h:154)
==16146==    by 0x504CF62: tre_parse_bracket_items (tre-parse.c:325)
==16146==    by 0x504CF62: tre_parse_bracket (tre-parse.c:442)
==16146==    by 0x504CF62: tre_parse (tre-parse.c:1369)
==16146==    by 0x5044A15: tre_compile (tre-compile.c:1920)
==16146==    by 0x50421D9: tre_regcompb (regcomp.c:150)
==16146==    by 0x4F69A3B: do_gsub (grep.c:1831)
==16146==    by 0x4F3D427: bcEval (eval.c:6473)
==16146==    by 0x4F4CEBF: Rf_eval (eval.c:624)
==16146==    by 0x4F4F2CE: R_execClosure (eval.c:1614)
==16146==    by 0x4F4D085: Rf_eval (eval.c:747)
==16146==    by 0x4F510D5: do_set (eval.c:2585)
==16146== 
==16146== Conditional jump or move depends on uninitialised value(s)
==16146==    at 0x56AB7F3: internal_utf8_loop (loop.c:298)
==16146==    by 0x56AB7F3: __gconv_transform_internal_utf8 (skeleton.c:609)
==16146==    by 0x5739CDD: wcsrtombs (wcsrtombs.c:110)
==16146==    by 0x56C1740: wcstombs (wcstombs.c:34)
==16146==    by 0x504CF62: wcstombs (stdlib.h:154)
==16146==    by 0x504CF62: tre_parse_bracket_items (tre-parse.c:325)
==16146==    by 0x504CF62: tre_parse_bracket (tre-parse.c:442)
==16146==    by 0x504CF62: tre_parse (tre-parse.c:1369)
==16146==    by 0x5044A15: tre_compile (tre-compile.c:1920)
==16146==    by 0x50421D9: tre_regcompb (regcomp.c:150)
==16146==    by 0x4F69A3B: do_gsub (grep.c:1831)
==16146==    by 0x4F3D427: bcEval (eval.c:6473)
==16146==    by 0x4F4CEBF: Rf_eval (eval.c:624)
==16146==    by 0x4F4F2CE: R_execClosure (eval.c:1614)
==16146==    by 0x4F4D085: Rf_eval (eval.c:747)
==16146==    by 0x4F510D5: do_set (eval.c:2585)
==16146== 
==16146== Conditional jump or move depends on uninitialised value(s)
==16146==    at 0x56AB7FC: internal_utf8_loop (loop.c:303)
==16146==    by 0x56AB7FC: __gconv_transform_internal_utf8 (skeleton.c:609)
==16146==    by 0x5739CDD: wcsrtombs (wcsrtombs.c:110)
==16146==    by 0x56C1740: wcstombs (wcstombs.c:34)
==16146==    by 0x504CF62: wcstombs (stdlib.h:154)
==16146==    by 0x504CF62: tre_parse_bracket_items (tre-parse.c:325)
==16146==    by 0x504CF62: tre_parse_bracket (tre-parse.c:442)
==16146==    by 0x504CF62: tre_parse (tre-parse.c:1369)
==16146==    by 0x5044A15: tre_compile (tre-compile.c:1920)
==16146==    by 0x50421D9: tre_regcompb (regcomp.c:150)
==16146==    by 0x4F69A3B: do_gsub (grep.c:1831)
==16146==    by 0x4F3D427: bcEval (eval.c:6473)
==16146==    by 0x4F4CEBF: Rf_eval (eval.c:624)
==16146==    by 0x4F4F2CE: R_execClosure (eval.c:1614)
==16146==    by 0x4F4D085: Rf_eval (eval.c:747)
==16146==    by 0x4F510D5: do_set (eval.c:2585)
==16146== 
==16146== Conditional jump or move depends on uninitialised value(s)
==16146==    at 0x56AB843: internal_utf8_loop (loop.c:298)
==16146==    by 0x56AB843: __gconv_transform_internal_utf8 (skeleton.c:609)
==16146==    by 0x5739CDD: wcsrtombs (wcsrtombs.c:110)
==16146==    by 0x56C1740: wcstombs (wcstombs.c:34)
==16146==    by 0x504CF62: wcstombs (stdlib.h:154)
==16146==    by 0x504CF62: tre_parse_bracket_items (tre-parse.c:325)
==16146==    by 0x504CF62: tre_parse_bracket (tre-parse.c:442)
==16146==    by 0x504CF62: tre_parse (tre-parse.c:1369)
==16146==    by 0x5044A15: tre_compile (tre-compile.c:1920)
==16146==    by 0x50421D9: tre_regcompb (regcomp.c:150)
==16146==    by 0x4F69A3B: do_gsub (grep.c:1831)
==16146==    by 0x4F3D427: bcEval (eval.c:6473)
==16146==    by 0x4F4CEBF: Rf_eval (eval.c:624)
==16146==    by 0x4F4F2CE: R_execClosure (eval.c:1614)
==16146==    by 0x4F4D085: Rf_eval (eval.c:747)
==16146==    by 0x4F510D5: do_set (eval.c:2585)
==16146== 
> g <- crayon::green; b <- crayon::blue; ansistrings::ansi_to_html(paste0(g("1"), g(b("1")), g(b(g("1"))), g(b(g(b("1"))))))
==16146== Use of uninitialised value of size 8
==16146==    at 0x4052640: ansistrings_make_ansi_map1 (map.c:177)
==16146==    by 0x4F0E839: R_doDotCall (dotcode.c:596)
==16146==    by 0x4F0EDC6: do_dotcall (dotcode.c:1252)
==16146==    by 0x4F4D4DC: Rf_eval (eval.c:728)
==16146==    by 0x4F4FF68: do_begin (eval.c:2192)
==16146==    by 0x4F4D2BB: Rf_eval (eval.c:700)
==16146==    by 0x4F4F2CE: R_execClosure (eval.c:1614)
==16146==    by 0x4F4D085: Rf_eval (eval.c:747)
==16146==    by 0x4FE8618: R_DispatchOrEvalSP (subset.c:623)
==16146==    by 0x4FE8618: do_subset3 (subset.c:1208)
==16146==    by 0x4F4D2BB: Rf_eval (eval.c:700)
==16146==    by 0x4F510D5: do_set (eval.c:2585)
==16146==    by 0x4F4D2BB: Rf_eval (eval.c:700)
==16146== 
==16146== Invalid read of size 4
==16146==    at 0x4052640: ansistrings_make_ansi_map1 (map.c:177)
==16146==    by 0x4F0E839: R_doDotCall (dotcode.c:596)
==16146==    by 0x4F0EDC6: do_dotcall (dotcode.c:1252)
==16146==    by 0x4F4D4DC: Rf_eval (eval.c:728)
==16146==    by 0x4F4FF68: do_begin (eval.c:2192)
==16146==    by 0x4F4D2BB: Rf_eval (eval.c:700)
==16146==    by 0x4F4F2CE: R_execClosure (eval.c:1614)
==16146==    by 0x4F4D085: Rf_eval (eval.c:747)
==16146==    by 0x4FE8618: R_DispatchOrEvalSP (subset.c:623)
==16146==    by 0x4FE8618: do_subset3 (subset.c:1208)
==16146==    by 0x4F4D2BB: Rf_eval (eval.c:700)
==16146==    by 0x4F510D5: do_set (eval.c:2585)
==16146==    by 0x4F4D2BB: Rf_eval (eval.c:700)
==16146==  Address 0x120ad3264 is not stack'd, malloc'd or (recently) free'd
==16146== 

 *** caught segfault ***
address 0x120ad3264, cause 'memory not mapped'

Traceback:
 1: .Call("ansistrings_make_ansi_map1", str, re$.match$start[[1]],     re$.match$end[[1]], re$.match$match[[1]], re$start$start[[1]],     re$start$match[[1]], re$end$match[[1]], PACKAGE = "ansistrings")
 2: make_ansi_map1(str)
 3: FUN(X[[i]], ...)
 4: vapply(text, ansi_to_html1, character(1), USE.NAMES = FALSE)
 5: ansistrings::ansi_to_html(paste0(g("1"), g(b("1")), g(b(g("1"))),     g(b(g(b("1"))))))
An irrecoverable exception occurred. R is aborting now ...
==16146== 
==16146== Process terminating with default action of signal 11 (SIGSEGV)
==16146==    at 0x5478FD9: raise (raise.c:46)
==16146==    by 0x547914F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.26.so)
==16146==    by 0x405263F: ansistrings_make_ansi_map1 (map.c:177)
==16146== 
==16146== HEAP SUMMARY:
==16146==     in use at exit: 59,827,236 bytes in 25,092 blocks
==16146==   total heap usage: 51,602 allocs, 26,510 frees, 117,642,371 bytes allocated
==16146== 
==16146== LEAK SUMMARY:
==16146==    definitely lost: 0 bytes in 0 blocks
==16146==    indirectly lost: 0 bytes in 0 blocks
==16146==      possibly lost: 0 bytes in 0 blocks
==16146==    still reachable: 59,827,236 bytes in 25,092 blocks
==16146==                       of which reachable via heuristic:
==16146==                         newarray           : 4,264 bytes in 1 blocks
==16146==         suppressed: 0 bytes in 0 blocks
==16146== Rerun with --leak-check=full to see details of leaked memory
==16146== 
==16146== For counts of detected and suppressed errors, rerun with: -v
==16146== Use --track-origins=yes to see where uninitialised values come from
==16146== ERROR SUMMARY: 1443 errors from 6 contexts (suppressed: 0 from 0)
Segmentation fault
krlmlr commented 6 years ago

Interestingly, I don't see this behavior in #18.

gaborcsardi commented 6 years ago

Wontfix, use fansi please.