horizon-eda / horizon

Horizon is a free EDA package
https://horizon-eda.org/
GNU General Public License v3.0
1.11k stars 82 forks source link

Segfault while running checks after updating planes with 6 inner layers #748

Closed bsilver8192 closed 1 year ago

bsilver8192 commented 1 year ago

EDIT: I realized I can bisect what changes in the board trigger this. I don't think it's layers, looks like it's some big new planes covering the entire board. I'm bisecting it further, will update again when I isolate something. This should also mean it's easier for me to work around for now, which is good.

I've got a project which consistently triggers segfaults if I alternate between running checks and updating planes a few times. It's got quite a few planes, but I don't think it's particularly large. If I go back to a previous revision which only has 4 inner layers, I do not see the segfaults or what looks like a memory leak (see below for details). Working version is uploaded here, non-working version is here.

Several stacktraces and values of a few relevant-looking variables are attached. I'm using 2.5.0-1 from https://mirror.selfnet.de/horizon-eda/debian-bullseye bullseye/main amd64.

The first stacktrace looks like trying to dereference uninitialized memory to me; it's attempting to dereference a pointer that's far outside anything valid. GDB doesn't seem to have the correct line number for the second and third, I'm guessing it's somewhere else in ProcessEdgesAtTopOfScanbeam that's actually segfaulting.

In the third stacktrace, I looked at the assembly to see what instruction is actually segfaulting. It's this, which is why I dumped the value of rax (which looks uninitialized):

0x561a7caf5f25 <_ZN10ClipperLib7Clipper27ProcessEdgesAtTopOfScanbeamEx+357>     ucomisd 0x30(%rax),%xmm4

There's something which surprises me going on with the memory usage too, I'm not sure if this is expected. It seems like every time I click the button to check rules (after I update planes in between) horizon-imp's memory usage goes up by around 500MB - 1GB, and then comes down a few hundred megabytes. This means it rapidly grows if I click the button a few times. If I don't update planes between, then the memory usage only goes up once, then seems to stabilize.

bsilver8192 commented 1 year ago

Looks like this was NVIDIA driver problems, sorry for the noise.

After doing some further debugging, my X11 session crashed... Pretty sure it's buggy NVIDIA GPU drivers corrupting userspace memory. I thought I had gotten those into a working state recently, but it was annoying enough I had an AMD GPU ready to install. After swapping to the AMD GPU, I can no longer reproduce this.