r-devel / r-dev-day

Repo to organize tasks for R Dev Days
4 stars 0 forks source link

Debugging segfaults on quartz() device #17

Open hturner opened 2 weeks ago

hturner commented 2 weeks ago

This an issue that requires debugging C code on macOS and would be working with Paul Murrell (@pmur002).

There have been reports of segfaults on a quartz() device that can be triggered with

library(marquee)
library(grid)
grid.draw(marquee_grob(marquee:::markdown_test, classic_style()))

This is related to the new dev->glyph() API described in the report Rendering Typeset Glyphs in R Graphics

REQUIREMENTS: building R on macOS with debugging enabled (the draft guide for building R on macOS configures R with debugging enabled).

This could be an opportunity to learn more about debugging C code on macOS. The following may be helpful:

pmur002 commented 2 weeks ago

A confirmation of the segfault on a MacOS system that will be at the Dev Day would be a great first step.

Nerwosolek commented 1 week ago

I can confirm that illegal operation has been caught on my Mac (Intel, macOS Monterey 12.7.5) R version 4.4. Here is a log:

library(marquee) library(grid) grid.draw(marquee_grob(marquee:::markdown_test, classic_style()))

caught illegal operation address 0x7ff8011c200a, cause 'illegal opcode'

Traceback: 1: grid.Call.graphics(C_glyph, as.integer(runs$lengths), x$glyphInfo, gx, gy) 2: drawDetails.glyphgrob(x, recording = FALSE) 3: drawDetails(x, recording = FALSE) 4: drawGrob(x) 5: recordGraphics(drawGrob(x), list(x = x), getNamespace("grid")) 6: grid.draw.grob(x$children[[i]], recording = FALSE) 7: grid.draw(x$children[[i]], recording = FALSE) 8: drawGTree(x) 9: recordGraphics(drawGTree(x), list(x = x), getNamespace("grid")) 10: grid.draw.gTree(x$children[[i]], recording = FALSE) 11: grid.draw(x$children[[i]], recording = FALSE) 12: drawGTree(x) 13: recordGraphics(drawGTree(x), list(x = x), getNamespace("grid")) 14: grid.draw.gTree(x$src, recording = FALSE) 15: grid.draw(x$src, recording = FALSE) 16: (function () { cvp <- current.viewport() hjust <- resolveHJust(cvp$just, cvp$hjust) vjust <- resolveVJust(cvp$just, cvp$vjust) pushViewport(viewport(hjust, vjust, just = c(hjust, vjust), mask = "none", layout = cvp$layout, xscale = cvp$xscale, yscale = cvp$yscale), recording = FALSE) grid.draw(x$src, recording = FALSE) popViewport(recording = FALSE)})() 17: .defineGroup(grp$src, grp$op, grp$dst) 18: drawDetails.GridGroup(x, recording = FALSE) 19: drawDetails(x, recording = FALSE) 20: drawGTree(x) 21: recordGraphics(drawGTree(x), list(x = x), getNamespace("grid")) 22: grid.draw.gTree(x$children[[i]], recording = FALSE) 23: grid.draw(x$children[[i]], recording = FALSE) 24: drawGTree(x) 25: recordGraphics(drawGTree(x), list(x = x), getNamespace("grid")) 26: grid.draw.gTree(marquee_grob(marquee:::markdown_test, classic_style())) 27: grid.draw(marquee_grob(marquee:::markdown_test, classic_style()))

Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection:

pmur002 commented 1 week ago

Excellent! Well, terrible that it segfaults. But excellent that you can reproduce it! That gives us something to work with.

pmur002 commented 1 week ago

Did you build R with the debug flag (-g)?

We will need to be able to perform the following steps:

Then the fun can start :)

Nerwosolek commented 1 week ago

So the previous log was from CRAN installation.

Now I have compiled the R myself on my Mac with -O0 -g flags. Attached to the process with lldb as you proposed and I got this from lldb:

(lldb) attach 7914
Process 7914 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007ff800f8dd1a libsystem_kernel.dylib`__select + 10
libsystem_kernel.dylib`:
->  0x7ff800f8dd1a <+10>: jae    0x7ff800f8dd24            ; <+20>
    0x7ff800f8dd1c <+12>: movq   %rax, %rdi
    0x7ff800f8dd1f <+15>: jmp    0x7ff800f86d79            ; cerror
    0x7ff800f8dd24 <+20>: retq
Target 0: (R) stopped.
Executable module set to "/Library/Frameworks/R.framework/Versions/R-devel-tomek/Resources/bin/exec/R".
Architecture set to: x86_64h-apple-macosx-.

When I go back to terminal with R console I cannot do anything as R is stopped. I debugged some simple C program in lldb previously but it was waaay simpler program than R :)

Nerwosolek commented 1 week ago

I managed to make R process continue and went through the same steps as previously to encounter the problem. This is what lldb returns:

(lldb) process continue
Process 7914 resuming
Process 7914 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00007ff8011c200a CoreFoundation`CFRelease.cold.1 + 14
CoreFoundation`CFRelease.cold.1:
->  0x7ff8011c200a <+14>: ud2

CoreFoundation`:
    0x7ff8011c200c <+0>:  leaq   0x20eea9(%rip), %rax      ; "*** __CFStringCollectionCopy() called with NULL ***"
    0x7ff8011c2013 <+7>:  movq   %rax, 0x415919f6(%rip)    ; gCRAnnotations + 8
    0x7ff8011c201a <+14>: ud2
Target 0: (R) stopped.
pmur002 commented 1 week ago

(Right. I think the attach stops the R process, so you need to continue in the debugger to proceed. )

Now that you have triggered the problem, it looks like we are deep in Apple land, so we need to go up to see where in R land this problem occurred. Can you find something that looks like a call within the R source code?

pmur002 commented 1 week ago

FYI, I will be travelling for the next week, so we may have to pick this back up again in Salzburg :)

Nerwosolek commented 1 week ago

The same with me :) I'll do what I can this week and I'm looking forward to work with you in Salzburg.

In the meantime I caught what is happening on the edge between R-land and Apple-land:

thread backtrace
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x00007ff8011c200a CoreFoundation`CFRelease.cold.1 + 14
    frame #1: 0x00007ff8010140f4 CoreFoundation`CFRelease + 108
    frame #2: 0x000000010ab453ae grDevices.so`RQuartz_glyph(n=7, glyphs=0x00007fe166b7099c, x=0x00007fe1676f0708, y=0x00007fe1676f1d08, font=0x00007fe16ba24528, size=10.199999999999999, colour=-16777216, rot=0, dd=0x00007fe16c30f240) at devQuartz.c:3013:5 [opt]
    frame #3: 0x000000010a750e12 libR.dylib`GEGlyph(n=<unavailable>, glyphs=<unavailable>, x=<unavailable>, y=<unavailable>, font=<unavailable>, size=<unavailable>, colour=<unavailable>, rot=<unavailable>, dd=0x00006000020bb900) at engine.c:3898:9 [opt]
    frame #4: 0x000000010ab92564 grid.so`L_glyph at typeset.c:88:9 [opt]

Went to this line 3013 of code in devQuartz.c: CFRelease(cfFontURL); which is in the function:

 void RQuartz_glyph(int n, int *glyphs, double *x, double *y,
                      SEXP font, double size,
                      int colour, double rot, pDevDesc dd)
  {

I set the breakpoint and started investigating step by step but lldb says that grDevices.so is compiled with optimizations so I don't have access to some variables, particularly cfFontURL.

Now I am trying to find the way to recompile grDevices.so without optimizations and with -g.

Nerwosolek commented 1 week ago

OK. I was mistaken, cfFontURL was accessible to inspect.

After some debugging I caught the exact values for which the RQuartz_glyph crashes.

So this function is called for bunch of fonts but it will crash for font: /System/Library/Fonts/Supplemental/Courier New.ttf

This happens for arguments:

(lldb) fr v n rot size colour
(int) n = 7
(double) rot = 0
(double) size = 10.199999999999999
(int) colour = -16777216

The problem arises when trying to get cfFontURL variable, we get NULL and then we try to release it via CFRelease(cfFontURL); in line 3013. It crashes R.

We need to inspect why we get NULL in this line of code: 3009: CFURLRef cfFontURL = CFURLCreateWithString(NULL, cfFontFileName, NULL); cfFontFileName seems to be OK: (CFStringRef) cfFontFileName = 0x0000600002d74500 "file:///System/Library/Fonts/Supplemental/Courier New.ttf"

but cfFontURL is NULL.

I checked that I have this font present in this path on my Mac.

The workaround is to check for NULL before releasing the variable but why we get NULL in the first place?

Nerwosolek commented 1 week ago

After some more tests in XCode it seems that spaces in the URL are not properly encoded. When the string looks like this: file:///System/Library/Fonts/Supplemental/Courier%20New.ttf" cfFontURL is allocated properly.

I guess we need to find a Objective-C function which will do proper encoding for us.

Perhaps CFURLCreateWithFileSystemPath would be a to go function like in this example: CFURLRef cfFontURL = CFURLCreateWithFileSystemPath(NULL, cfFontFileName, kCFURLPOSIXPathStyle , FALSE);

Needs further testing.

pmur002 commented 1 week ago

Nice work! We might also benefit from some input from @s-u

Look forward to meeting up in Salzburg.