openscad / openscad

OpenSCAD - The Programmers Solid 3D CAD Modeller
https://www.openscad.org
Other
6.86k stars 1.2k forks source link

Crash when `--animate=4` but works when `--animate=3` #4028

Open spuder opened 2 years ago

spuder commented 2 years ago

I'm attempting to use the --animate option to create a gif as shown here

I am finding that it always works when using --animate 3 regardless of the environment (Mac OSX or Docker). However openscad fails when using a value > 3 inside docker.

Count Type Result
3 OSX ✅ Docker ✅
4 OSX ✅ Docker ❌

The error is X Error of failed request: BadRequest (invalid request code or no such operation)

Full output

Exporting /dev/null...
WARNING: Viewall and autocenter disabled in favor of $vp*
Compiling design (CSG Products normalization)...
Normalized CSG tree has 1 elements
Geometries in cache: 1
Geometry cache size in bytes: 728
CGAL Polyhedrons in cache: 0
CGAL cache size in bytes: 0
Total rendering time: 0:00:00.165
Exporting /dev/null...
Compiling design (CSG Products normalization)...
Normalized CSG tree has 1 elements
Geometries in cache: 1
Geometry cache size in bytes: 728
CGAL Polyhedrons in cache: 0
CGAL cache size in bytes: 0
Total rendering time: 0:00:00.085
Exporting /dev/null...
Compiling design (CSG Products normalization)...
Normalized CSG tree has 1 elements
Geometries in cache: 1
Geometry cache size in bytes: 728
CGAL Polyhedrons in cache: 0
CGAL cache size in bytes: 0
Total rendering time: 0:00:00.088
X Error of failed request:  BadRequest (invalid request code or no such operation)
  Major opcode of failed request:  0 ()
  Serial number of failed request:  39
  Current serial number in output stream:  40

Understandably you can't support every environment, and docker is one I don't expect you to support. But I would like help understanding the difference in behaviors so I can figure out a work around.

I looked through the original merge request and found this line which seems to indicate there is indeed a different code path with --animate 3 and --animate 4. But I don't quite understand what it is doing differently.

if (cmd.viewOptions.renderer == RenderType::CGAL && root_geom->getDimension() == 3) {

https://github.com/openscad/openscad/pull/3280/files#diff-ad61dad34cefd1548abd12117bfdb48e4f44d4f9a0ca65cca9c217b1d783e16eR491

Steps to reproduce

Works

openscad /dev/null -D '$vpd = 20; $vpr = [0,$t * 360,0];' -o '/foo/foo.png' -D 'cube([2,3,4]);' --imgsize=250,250 --animate 3 --colorscheme 'Tomorrow Night' --viewall --autocenter --preview

Crashes

MYTMPDIR="$(mktemp -d)"
docker run --rm -v $(PWD):/data -v ${MYTMPDIR}:/foo spuder/openscad:latest openscad /dev/null -D '$vpd = 20; $vpr = [0,$t * 360,0];' -o '/foo/foo.png' -D 'cube([2,3,4]);' --imgsize=250,250 --animate 3 --colorscheme 'Tomorrow Night' --viewall --autocenter --preview

What does if (cmd.viewOptions.renderer == RenderType::CGAL && root_geom->getDimension() == 3) { do? My best guess is that it is activating OpenGL which I don't have running inside docker. Any guidance appreciated.

--- Want to back this issue? **[Post a bounty on it!](https://app.bountysource.com/issues/112040429-crash-when-animate-4-but-works-when-animate-3?utm_campaign=plugin&utm_content=tracker%2F52063&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://app.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F52063&utm_medium=issues&utm_source=github).
thehans commented 2 years ago

1) I don't believe it is possible to export images without OpenGL. 2) Looks like you are exporting to /dev/null? How do you verify that your command with 3 animation steps is even creating image files? 3) The conditional with root_geom->getDimension() == 3 is checking whether the top level geometry is 3D or 2D, nothing to do with the number of frames (and it uses OpenGL either way).

spuder commented 2 years ago

Thanks for the feedback.

I don't believe it is possible to export images without OpenGL.

This is where I'm a bit confused, because I can successfully render out preview images as long as --animation 3 or less. I would expect that if opengl were a hard dependency, then it would fail every time

Here is an example that shows openscad can render a cube. This works every time and is portable so anyone in the world with docker should get the same result.

MYTMPDIR="$(mktemp -d)"
echo $MYTMPDIR
docker run --rm -v $(PWD):/data -v ${MYTMPDIR}:/foo spuder/openscad:latest openscad /dev/null -D '$vpd = 20; $vpr = [0,$t * 360,0];' -o '/foo/foo.png' -D 'cube([2,3,4]);' --imgsize=250,250 --animate 3 --colorscheme 'Tomorrow Night' --viewall --autocenter --preview
echo $MYTMPDIR
ls $MYTMPDIR
foo00000.png foo00001.png foo00002.png

Looks like you are exporting to /dev/null? How do you verify that your command with 3 animation steps is even creating image files?

The docker container has 2 bind mounts -v $(PWD):/data -v ${MYTMPDIR}:/foo
-v $(PWD):/data is the input folder (not applicable in the above example) -v ${MYTMPDIR}:/foo is the output folder and is shared between the docker container and the host.

ls $MYTMPDIR
foo00000.png foo00001.png foo00002.png

foo00001

Changing --animate 3 to --animate 4 will give the error shown above

docker run --rm -v $(PWD):/data -v ${MYTMPDIR}:/foo spuder/openscad:latest openscad /dev/null -D '$vpd = 20; $vpr = [0,$t * 360,0];' -o '/foo/foo.png' -D 'cube([2,3,4]);' --imgsize=250,250 --animate 4 --colorscheme 'Tomorrow Night' --viewall --autocenter --preview
Executing /OpenSCAD.AppImage --appimage-extract-and-run "/dev/null -D $vpd = 20; $vpr = [0,$t * 360,0]; -o /foo/foo.png -D cube([2,3,4]); --imgsize=250,250 --animate 4 --colorscheme Tomorrow Night --viewall --autocenter --preview"

I'm scratching my head why <3 works but >4 crashes. I've verified this isn't a memory/cpu issue. I'll keep exploring the OpenGL route, but I'm skeptical that it will fix the issue unless you are aware of some different code path openscad travels when >4. It may be possible to skip the --animate option and render out all frames 1 -3 at a time, but I'd much rather understand the crash than work around it.

t-paul commented 2 years ago

This is probably just not getting the display connection causing this: #3368.

t-paul commented 2 years ago

Hmm, no that's not the issue. I've tried a couple of times and got various different X error codes, so something fishy is going on, it does seem to be random though. On Linux I can sometimes generate 8 images fine, sometimes it crashes early.

spuder commented 2 years ago

I switched from my personal docker container, to the official one that was just released today. I'm finding the same results with the openscad/openscad:2021.01 container.

After running half a dozen times I found that it crashes after 5-7 images (never the same amount twice). Furthermore the last image is always corrupted (0 bytes).

Screen Shot 2021-12-27 at 5 10 07 PM
MYTMPDIR="$(mktemp -d)"
rm $MYTMPDIR/*.png
docker run --init --rm -v $(PWD):/data -v ${MYTMPDIR}:/foo openscad/openscad:2021.01 xvfb-run -a openscad /dev/null -D '$vpd = 20; $vpr = [0,$t * 360,0];' -o '/foo/foo.png' -D 'cube([2,3,4]);' --imgsize=250,250 --animate 30 --colorscheme 'Tomorrow Night' --viewall --autocenter --preview
ls $MYTMPDIR

I suspect it has something to do with xvfb and a race condition, since I have no trouble running ---animate 30 or even --animate 60 on OSX with 2021.01

spuder commented 2 years ago

I've done more testing to try and identify any command line arguments that would be causing this crash. It looks like it is the --animate option

This works to export a single PNG.

echo "cube([3,4,5]);" > foo.scad

docker run --init \
    -v $(pwd):/input \
    -v "(pwd):/output" \
     openscad/openscad:2021.01 xvfb-run -a openscad "/input/foo.scad" -o "/output/foo.png"

This fails ❌

echo "cube([3,4,5]);" > foo.scad

docker run --init \
    -v $(pwd):/input \
    -v "(pwd):/output" \
     openscad/openscad:2021.01 xvfb-run -a openscad "/input/foo.scad" -o "/output/foo.png" --animate 60
Geometry cache size in bytes: 728
CGAL Polyhedrons in cache: 0
CGAL cache size in bytes: 0
Total rendering time: 0:00:00.125
X Error of failed request:  BadLength (poly request too large or internal Xlib length error)
  Major opcode of failed request:  31 (X_GrabKeyboard)
  Serial number of failed request:  37
  Current serial number in output stream:  38
make: *** [gif] Error 1

Note that I'm using xvfb-run -a openscad

spuder commented 2 years ago

I've gathered more debugging information with valgrind and Strace (I haven't yet figured out how to get a GDB dump.

I need some assistance interpreting these logs. I do see some memory leaks, but I don't know if they are related.

docker run -it openscad/openscad:2021.01 /bin/bash
apt install strace valgrind gdb -y

valgrind --trace-children=yes --trace-syscalls=yes xvfb-run --error-file /input/error.log --server-args="-screen 0 1024x768x24" -a openscad /input/CAD/foo.scad -o /input/foo.png --animate 60 --debug=all > /input/valgrind.txt
strace -o /tmp/strace.txt xvfb-run --error-file /input/error.log --server-args="-screen 0 1024x768x24" -a openscad /input/CAD/foo.scad -o /input/foo.png --animate 60 --debug=all

More interestingly is this output about wrong rounding. Could this be a compile time issue creating a rounding error? or is this just a valgrind quirk?

root@767a65a94755:/openscad# valgrind --log-file=/input/valgrind.txt --trace-children=yes --trace-syscalls=yes xvfb-run --error-file /input/error.log --server-args="-screen 0 1024x768x24" -a openscad /input/CAD/foo.scad -o /input/foo.png --animate 60 --debug=all
terminate called after throwing an instance of 'CGAL::Assertion_exception'
  what():  CGAL ERROR: assertion violation!
Expr: -CGAL_IA_MUL(-1.1, 10.1) != CGAL_IA_MUL(1.1, 10.1)
File: /usr/include/CGAL/Interval_nt.h
Line: 210
Explanation: Wrong rounding: did you forget the  -frounding-math  option if you use GCC (or  -fp-model strict  for Intel)?
Aborted

The other interesting output is at the end of strace.txt 1 EBADF (Bad file descriptor)

close(4)                                = 0
stat("/usr/local/sbin/fmt", 0x7ffe0b57dca0) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/fmt", 0x7ffe0b57dca0) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/fmt", 0x7ffe0b57dca0)   = -1 ENOENT (No such file or directory)
stat("/usr/bin/fmt", {st_mode=S_IFREG|0755, st_size=43680, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbc3dfa6850) = 507
close(3)                                = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=507, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 0
close(-1)                               = -1 EBADF (Bad file descriptor)
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 506
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 507
dup2(11, 1)                             = 1
close(11)                               = 0
exit_group(5)                           = ?
+++ exited with 5 +++

Full dumps below :point_down:

error.log strace.txt


spuder commented 2 years ago

Improved straces (since the previous strace was for xvfb, not openscad)

docker run -it -v $(pwd):/input openscad/openscad:2021.01 /bin/bash
apt install strace valgrind gdb -y
xvfb-run --error-file /input/error.log --server-args="-screen 0 1024x768x24" -a strace -o /input/strace.txt openscad /input/CAD/foo.scad -o /input/foo.png --animate 60 --debug=all

strace.txt

Strace with -f to follow child processes

docker run -it -v $(pwd):/input openscad/openscad:2021.01 /bin/bash
apt install strace valgrind gdb -y
xvfb-run --error-file /input/error.log --server-args="-screen 0 1024x768x24" -a strace  -f -o /input/strace.txt openscad /input/CAD/foo.scad -o /input/foo.png --animate 60 --debug=all

strace.txt

rcolyer commented 2 years ago

Thanks for the updated straces @spuder. Those are much better. From the strace, this appears to be an OpenGL issue. It looks like the OpenGL library has some sort of problem, tries to report the error, and then segfaults while trying to clean up memory mapped for the OpenGL Vendor Neutral Dispatch library.

spuder commented 2 years ago

openscad --info

root@5e048edb6f83:/openscad# xvfb-run -a openscad --info
QObject::startTimer: Timers can only be used with threads started with QThread
OpenSCAD Version: 2021.01 (git 41f58fe57)
System information: Linux 5.10.76-linuxkit #1 SMP Mon Nov 8 10:21:19 UTC 2021 x86_64 Debian GNU/Linux 10 (buster) 4 CPUs 1.94 GB RAM
User Agent: OpenSCAD/2021.01 (git 41f58fe57) (Linux x86_64; Debian GNU/Linux 10 (buster))
Compiler: GCC "8.3.0" 64bit
MinGW build: No
Debug build: No
Boost version: 1_67
Eigen version: 3.3.7
CGAL version, kernels: 4.13, Cartesian<Gmpq>, Extended_cartesian<Gmpq>, Epeck
OpenCSG version: OpenCSG 1.4.2
Qt version: 5.11.3
QScintilla version: 2.10.4
InputDrivers: 
GLib version: 2.58.3
lodepng version: 20180910
libzip version: 1.5.1
fontconfig version: 2.13.1
freetype version: 2.9.1
harfbuzz version: 2.3.1
cairo version: 1.16.0
lib3mf version: 1.8.1
Features: input-driver-dbus, lazy-union
Application Path: /usr/local/bin
Documents Path: /root/.local/share
User Documents Path: /root
Resource Path: /usr/local/share/openscad
User Library Path: 
User Config Path: 
Backup Path: 
OPENSCADPATH: <not set>
OpenSCAD library path:

  /usr/local/share/openscad/libraries

OPENSCAD_FONT_PATH: <not set>
OpenSCAD font path:
  /usr/share/fonts
  /usr/X11R6/lib/X11/fonts
  /usr/local/share/fonts
  /root/.local/share/fonts
  /root/.fonts
  /usr/X11/lib/X11/fonts
  /System/Library/Fonts
  /Library/Fonts
  /root/Library/Fonts
  /usr/share/fonts/truetype
  /usr/share/fonts/truetype/dejavu

GLEW version: 2.1.0
OpenGL Version: 3.1 Mesa 18.3.6
GL Renderer: llvmpipe (LLVM 7.0, 256 bits)
GL Vendor: VMware, Inc.
RGBA(8888), depth(24), stencil(8)
GL_ARB_framebuffer_object: yes
GL_EXT_framebuffer_object: yes
GL_EXT_packed_depth_stencil: yes
GL context creator: GLX
PNG generator: lodepng
GLX version: 1.4
OS info: Linux 5.10.76-linuxkit #1 SMP Mon Nov 8 10:21:19 UTC 2021
Machine: x86_64
spuder commented 2 years ago

I tried [building images based on the nvidia/opengl docker image to see if they would have the same issue. They crash just the same as the debian:buster images.

docker run -it -v $(pwd):/input openscad/openscad:2021.01-focal-opengl
xvfb-run -a openscad /input/CAD/foo.scad -o /input/foo.png --animate 60 --debug=all --preview
...
Total rendering time: 0:00:00.077
export_png: export_png_preview_common
X Error of failed request:  BadRequest (invalid request code or no such operation)
  Major opcode of failed request:  0 ()
  Serial number of failed request:  38
  Current serial number in output stream:  39
spuder commented 2 years ago

I tried building a docker image with symbols enabled to see if gdb could generate a backtrace

Unfortunately I'm unable to generate a backtrace. I'm not sure if this is an inexperience problem on my part or a configuration problem.

docker run --ulimit core=-1 -it -v $(pwd):/input docker.io/openscad/openscad:2021.01-debug /bin/bash.io/openscad/openscad:2021.01-debug /
apt update; apt install gdb -y
xvfb-run gdb --ex run --args openscad -o x.png -D 'cube([2,3,4]);' --imgsize=250,250 --animate 60 /input/CAD/foo.scad

X Error of failed request:  BadRequest (invalid request code or no such operation)
  Major opcode of failed request:  0 ()
  Serial number of failed request:  37
  Current serial number in output stream:  38
[Thread 0x7f57eee10700 (LWP 309) exited]
[Thread 0x7f57ef611700 (LWP 308) exited]
[Thread 0x7f57efe12700 (LWP 307) exited]
[Thread 0x7f57f4dca680 (LWP 243) exited]
[Inferior 1 (process 243) exited with code 01]
(gdb) bt
No stack.
(gdb) quit
nophead commented 2 years ago

Looks like the X server, a separate process, returns an error and OpenSCAD cleanly exits. To get a stack trace you would need to set a breakpoint where the error is handled in the OpenScad process.