linuxwacom / xf86-input-wacom

X.Org driver for Wacom devices
356 stars 45 forks source link

Gimp crash when enabling wacom dynamics #307

Closed DenisSalem closed 11 months ago

DenisSalem commented 1 year ago

Hello

As shown here :

There is an issue with wacom tablet running along with Gimp. On this subreddit someone find out a workaroung by downgrading xf86-input-wacom to 1.1.0.

I don't know if the issue come from Gimp or X11 but I believe there is something to investigate. In the meantime it could be helpful to get back on Gentoo (or any distro) the previous version of the driver which I believe has been removed few days ago.

Best regards

thesamesam commented 1 year ago

I'll go ahead and restore the older version in Gentoo, thanks!

whot commented 1 year ago

There are only 12 patches between 1.1.0 and 1.2.0, so it'd be great if someone could bisect this. Realistically, there are only 4 or 5 commits that could even have an effect, with a rough guess going to fe923e927a8ddf4d2e82ef4757c885b06d47fa03 since it's the most invasive one.

whot commented 1 year ago

Nevermind, ended up having a few spare minutes and yes: https://github.com/linuxwacom/xf86-input-wacom/commit/fe923e927a8ddf4d2e82ef4757c885b06d47fa03 is the source of the issue.

Reproducer is easy, works with a raw X server and gimp which crashes on the first interaction with a canvas (Intuos Pro/PTH-660). Now, the main question is what gimp is unhappy about here, gdb with gimp-2.10.34-1.fc38.x86_64 says

Thread 1 "gimp" received signal SIGABRT, Aborted.
0x00007ffff6bb7844 in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6bb7844 in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x00007ffff6b66abe in raise () at /lib64/libc.so.6
#2  0x00007ffff6b4f87f in abort () at /lib64/libc.so.6
#3  0x00007ffff6b5060f in _IO_peekc_locked.cold () at /lib64/libc.so.6
#4  0x00007ffff6c4b979 in  () at /lib64/libc.so.6
#5  0x00007ffff6c4b954 in  () at /lib64/libc.so.6
#6  0x00005555557b30a9 in gimp_device_info_get_device_coords (info=info@entry=0x5555575a8020, window=window@entry=0x555558006b70, coords=coords@entry=0x7fffffffcdb0) at widgets/gimpdeviceinfo-coords.c:181
#7  0x00005555557b3108 in gimp_device_info_get_event_coords (info=0x5555575a8020, window=0x555558006b70, event=<optimized out>, coords=0x7fffffffcdb0) at widgets/gimpdeviceinfo-coords.c:119
#8  0x000055555573529f in gimp_display_shell_get_event_coords (shell=0x555557f31af0, event=0x555558bf2090, display_coords=0x7fffffffcdb0, state=0x7fffffffcda0, time=0x7fffffffcda4) at display/gimpdisplayshell-tool-events.c:2040
#9  0x0000555555a714a6 in gimp_display_shell_canvas_tool_events_internal.constprop.0 (canvas=canvas@entry=0x555557f3d210, event=event@entry=0x555558bf2090, shell=shell@entry=0x555557f31af0, next_event=next_event@entry=0x7fffffffcf20) at display/gimpdisplayshell-tool-events.c:553
#10 0x0000555555729e6e in gimp_display_shell_canvas_tool_events (canvas=0x555557f3d210, event=0x555558bf2090, shell=0x555557f31af0) at display/gimpdisplayshell-tool-events.c:310
#11 0x00007ffff7b5d3ee in _gtk_marshal_BOOLEAN__BOXED () at /lib64/libgtk-x11-2.0.so.0

That commit changed the number of axes from 6 to 8 - I wonder if that triggers a bug, either in gimp because it only expects 6 axes or in any of the stack in between. Indeed, GDK_AXIS_LAST from GDK2 is 7.

gimp_device_info_get_device_coords() where it crashes has a local axes[GDK_AXIS_LAST] array and after the call to gdk_device_get_state() the axes array looks like it's the one that is smashed.

Thread 1 "gimp" hit Breakpoint 1, gimp_device_info_get_device_coords (info=info@entry=0x5555575b16b0, window=window@entry=0x5555580014d0, coords=coords@entry=0x7fffffffcdb0) at widgets/gimpdeviceinfo-coords.c:128
128 {
...
(gdb) p axes
$9 = {1234.3285714285712, 755.93243243243251, 0, 0.007874015748031496, 0.007874015748031496, 0.50027793218454697, -nan(0x8000000000000)}

Note the NaN in position 7 - that's not good. Now, that value could be rubbish sent by the X driver but xinput test-xi2 shows the new axes never even being included in the event - which makes sense since they trigger scroll events only. So right now I'm pointing the finger at either GDK or GIMP itself but I'm unlikely to find time to debug this in detail.

However, after installing gdk2 debuginfos the little bit I can provide for the call to gdk_device_get_state():

#1  0x00007ffff79c1b56 in IA__gdk_device_get_state (device=0x555555f9d4e0, window=0x55555800b970, axes=0x7fffffffcc30, mask=0x0)
    at x11/gdkinput-x11.c:952
...
(gdb) p *((XValuatorState *)input_class)
$17 = {class = 2 '\002', length = 48 '0', num_valuators = 8 '\b', mode = 3 '\003', valuators = 0x555558c34310}
(gdb) p *((XValuatorState *)input_class)->valuators@8
$18 = {34384, 24361, 0, 0, 0, 0, 0, 0}

IOW it looks like the data is correct and the issue is num_valuators = 8 being copied into an array sized 6. A simple fix to extend that array to at least size 8 should get rid of that crash though I wonder how many others are lurking...

TLDR: fix the axes array in GIMP's gimp_device_info_get_device_coords() to be size 8 or larger and try again. Might fix the issue, might move the issue to the next instance...

whot commented 1 year ago

cc @jigpu, @Pinglinux , @Greenscreener - we may end up needing an option akin to the Pressure2K to disable those new axes (and thus panscrolling). I'd be surprised if gimp is the only ones with problems here (well, anything GDK2 probably).

whot commented 1 year ago

FTR the following diff applied to Fedora 38 gimp-2.10.34-1.fc38.x86_64 appears to fix the issue.

diff --git a/app/widgets/gimpdeviceinfo-coords.c b/app/widgets/gimpdeviceinfo-coords.c
index 93c1d90..765a716 100644
--- a/app/widgets/gimpdeviceinfo-coords.c
+++ b/app/widgets/gimpdeviceinfo-coords.c
@@ -126,7 +126,7 @@ gimp_device_info_get_device_coords (GimpDeviceInfo *info,
                                     GdkWindow      *window,
                                     GimpCoords     *coords)
 {
-  gdouble axes[GDK_AXIS_LAST] = { 0, };
+  gdouble axes[16] = { 0, };

   *coords = default_coords;

I tested this diff in F38 and GIMP no longer crashes and I can paint on the canvas, didn't really test much beyond that though.

gasche commented 1 year ago

I looked at this bug out of curiosity. My impression is that the patch sketched above is not the "right" way to fix the GDK, and that there should be a fix to gdk2 that makes it forward-compatible with tablets with more axes, as introduced here.

My understanding is that the GDK model for axes is intentionally restricted, it only recognizes a fixed set of axes "uses", which may be more than what the device actually supports, or less. It has a garbage value GDK_AXIS_IGNORE for axes that it does not know about. (See gdk2 documentation for GdkAxisUse). This model is flexible enough to accomodate devices with more axes, but presumably this was not tested well.

Most code inside Gdk manipulates all axes provided by the device, using dynamic arrays without a size assumption, and should work fine. For example:

https://gitlab.gnome.org/GNOME/gtk/-/blob/gtk-2-24/gdk/x11/gdkinput-x11.c#L184-216

The problem comes from the gdk_device_get_state function that writes axes information into an array provided by the user, precisely in the function pointed out by @whot above:

https://gitlab.gnome.org/GNOME/gimp/-/blob/gimp-2-10/app/widgets/gimpdeviceinfo-coords.c?ref_type=heads#L124-133

To work properly, this code needs the guarantee that the axes array used as an out-parameter for gdk_device_get_state is only populated by one axis per known use, and not populated with GDK_AXIS_IGNORE axes, which should guarantee that it has at most GDK_AXIS_LAST elements (in fact GDK_AXIS_LAST - GDK_AXIS_X, as GDK_AXIS_X is the first meaningful use; other code in gimp uses this, see https://gitlab.gnome.org/GNOME/gimp/-/blob/gimp-2-10/app/widgets/gimpdeviceinfoeditor.c ). Filling this array with more axes (several axes for the same use, or axes for ignored uses) is useless anyway as this array is used with gdk_device_get_axis which takes a "use" argument to locate a given axis.

(Another approach would be to store all device-provided axes, and use the gdk_device_get_n_axes to decide its size dynamically. But this accessor was only added in somewhat recent versions of the GDK, start with 2.22.)

This guarantee on the axes array is currently not respected by the gdk implementation itself, in the per-backend function gdk_input_translate_coordinates https://gitlab.gnome.org/GNOME/gtk/-/blob/gtk-2-24/gdk/x11/gdkinput-x11.c#L528-549 where the out-variable provided by the user is called axis_out.

  for (i=0; i<gdkdev->info.num_axes; i++)
    {
      switch (gdkdev->info.axes[i].use)
    {
    case GDK_AXIS_X:
      axis_out[i] = x_offset + x_scale * (axis_data[x_axis] - x_min);
      if (x_out)
        *x_out = axis_out[i];
      break;
    case GDK_AXIS_Y:
      axis_out[i] = y_offset + y_scale * (axis_data[y_axis] - y_min);
      if (y_out)
        *y_out = axis_out[i];
      break;
    default:
      axis_out[i] =
        (gdkdev->info.axes[i].max * (axis_data[i] - gdkdev->axes[i].min_value) +
         gdkdev->info.axes[i].min * (gdkdev->axes[i].max_value - axis_data[i])) /
        (gdkdev->axes[i].max_value - gdkdev->axes[i].min_value);
      break;
    }
    }

I believe that a better fix to the issue would be to use code as the following, which discardes _IGNORE axes:

  int dst = 0;
  for (i=0; i<gdkdev->info.num_axes; i++)
    {
      switch (gdkdev->info.axes[i].use)
    {
        case GDK_AXIS_IGNORE:
      break;
    case GDK_AXIS_X:
      axis_out[dst] = x_offset + x_scale * (axis_data[x_axis] - x_min);
      if (x_out)
        *x_out = axis_out[dst];
          dst++;
      break;
    case GDK_AXIS_Y:
      axis_out[dst] = y_offset + y_scale * (axis_data[y_axis] - y_min);
      if (y_out)
        *y_out = axis_out[dst];
          dst++;
      break;
    default:
      axis_out[dst++] =
        (gdkdev->info.axes[i].max * (axis_data[i] - gdkdev->axes[i].min_value) +
         gdkdev->info.axes[i].min * (gdkdev->axes[i].max_value - axis_data[i])) /
        (gdkdev->axes[i].max_value - gdkdev->axes[i].min_value);
      break;
    }
    }

Or even, to guarantee that a single axis is output per use:

  int dst = 0;
  int ignore[GDK_AXIS_LAST] = { 0, };
  ignore[GDK_AXIS_IGNORE] = 1;
  for (i=0; i<gdkdev->info.num_axes; i++)
    {
      int use = gdkdev->info.axes[i].use;
      if (ignore[use]) continue;
      ignore[use] = 1; // ignore this use next time
      switch (gdkdev->info.axes[i].use)
    {
    case GDK_AXIS_X:
      axis_out[dst] = x_offset + x_scale * (axis_data[x_axis] - x_min);
      if (x_out)
        *x_out = axis_out[dst];
          dst++;
      break;
    case GDK_AXIS_Y:
      axis_out[dst] = y_offset + y_scale * (axis_data[y_axis] - y_min);
      if (y_out)
        *y_out = axis_out[dst];
          dst++;
      break;
    default:
      axis_out[dst++] =
        (gdkdev->info.axes[i].max * (axis_data[i] - gdkdev->axes[i].min_value) +
         gdkdev->info.axes[i].min * (gdkdev->axes[i].max_value - axis_data[i])) /
        (gdkdev->axes[i].max_value - gdkdev->axes[i].min_value);
      break;
    }
    }

Note: a (small) difficulty is that this code is duplicated for several gdk backends, so it may require fixing in more than gdkinput-x11.c -- at least gdkinput-win32.c has the same issue.

(Note: versions of gimp that use gtk3 are written in a more robust style and should work fine, but most users still use gimp 2.x that uses gtk2.)

@whot: I wonder if I should suggest this change to gdk people upstream, but I don't have an easy way to test the change. If you do, and you find the suggestion reasonable, would you mind trying this? (I understand that your proposal #309 would make the wacom-specific aspect of the issue go away by disabling the extra axes when a specific configuration option is set, but it would still be nice to fix the underlying issue.)

whot commented 1 year ago

My impression is that the patch sketched above is not the "right" way to fix the GDK,

Oh, definitely, apologies for not making this clear, this is a workaround for this particular occurrence of this bug and there may be other parts in the code where the axis number is problematic.

My notes on this match your explanation (though you went deeper, I basically stopped after identifying the guilty GDK function) and your patch proposal for GDK sounds correct. The second patch mixes two different issues though (ignoring + dropping duplicates) so I'd recommend splitting that into two patches. Not sure the duplicate handling is needed, any device/driver with duplicate axes should be considered buggy.

I wonder if I should suggest this change to gdk people upstream

Yes please, this would allow us to keep the discussion in one place too, right now it's spread across a few issues. I probably won't find the time to follow up on this particular instance.

I don't have an easy way to test the change.

Well, at least that part is easy to fix :) a libinput record of my tablet is stylus.txt

$ sudo libinput replay stylus.txt
/dev/input/event6: Wacom Intuos Pro M Pen
Hit enter to start replaying

That gives you one short stroke of the pen. Just make sure the canvas is underneath the starting point and this should crash GIMP. Hit enter to keep re-playing, it'll always produce the same stroke. If you need a different recording, e.g. one that starts in a different position let me know, it takes less than a minute.

gasche commented 1 year ago

I went ahead and proposed a (slightly different) patch against gtk2 at

https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/6045

I will also create an issue on the gimp bugtracker to ask for testing help (I'm... too lazy to setup a gimp development environment on my end), and see if they would be interested in a workaround in the event that the gtk maintainers do not want to fix the gtk2 codebase at this point.

Edit: two workarounds proposed at the gimp level; https://gitlab.gnome.org/GNOME/gimp/-/merge_requests/934, https://gitlab.gnome.org/GNOME/gimp/-/merge_requests/935

Pinglinux commented 1 year ago

Thank you @gasche for the workaround! @whot Feel free to merge your X SmoothPanscrollingEnabled patch.