naqvis / crystal-vips

Crystal bindings for the libvips image processing library.
MIT License
41 stars 5 forks source link

Segfault in SourceCustom #9

Closed xendk closed 8 months ago

xendk commented 8 months ago

I got a segfault in my pet project and I'm 95% sure it showed up after upgrading to Crystal 1.11.2.

It's reproduce-able with the specs:

  Vips::SourceCustom
    can create a custom source
    can load a custom sourceProgram exited because of an invalid memory access

But it wasn't caught by Github actions as mentioned in #8.

naqvis commented 8 months ago

Yeah, seems there is a regression on Mac for Crystal version 1.11.2. I tried to investigate the reason, but couldn't find a reasonable reason to identify the root cause.

I tried the same on Linux distributions (Ubuntu and Alpine) and specs are going green on latest crystal version.

For CI it was got disabled after multiple failures due to GA Ubuntu failing to install libvips. I enabled CI and tried to push a dummy PR, action was triggered, but as usual it got failed on steps dealing with installing libvips.

xendk commented 8 months ago

Well, it's not just Mac, I get it on Debian testing. It worked with an earlier Crystal (sadly I didn't note down the previous version, but I think it was 1.9-ish). I've tried upgrading all packages, for good measure, but that didn't change anything.

And now I got to deal with the fallout of upgrading everything. :grinning:

naqvis commented 8 months ago

I tried with docker images from both crystallang/crystal and 84codes/crystal with variation of different arch/os/version. Specs works on them. But with recent crystal version on Mac M1 it get stuck on the very same sourceCustom test.

I've gone through the changlog of crystal releases and i doubt if this problem is caused by PR#13989 though I couldn't prove any relation between the two. But looking the history of that PR , it also had impact on webassembly integration.

I spent whole day on investigating the root cause for this issue, but couldn't figure out what could be causing that. All I can see is that program hangs in the mid of custom source read operation. And that portion of code is invoked by external libvips.

xendk commented 8 months ago

OK, this is annoying. Trying in a container using these steps:

$ docker run --rm -it -v $PWD:/vips --name vips debian:trixie bash
# apt update
# apt install curl
# curl -fsSL https://crystal-lang.org/install.sh | bash
# apt install libvips-dev
# cd /vips
# crystal spec

Gives me another error:

root@9b3e652cc881:/vips# crystal spec   
.............................................................................F...........................*.......................

Pending:
  Vips::Connection Vips::SourceCustom can write an image to a user output stream

Failures:

  1) Vips::Image test colourspace
     Failure/Error: pixel[3].should eq(42)

       Expected: 42
            got: 0.16470588743686676

     # spec/image_spec.cr:615

Finished in 692.93 milliseconds
129 examples, 1 failures, 0 errors, 1 pending

Failed examples:

crystal spec spec/image_spec.cr:594 # Vips::Image test colourspace

Which is not a segfault, but might interest you.

I've checked Crystal, LLVM and libvips versions, and they're the same:

root@9b3e652cc881:/vips# apt list \*vips\*
Listing... Done
gir1.2-vips-8.0/testing,now 8.15.1-1 amd64 [installed,automatic]
libvips-dev/testing,now 8.15.1-1 amd64 [installed]
libvips-doc/testing,now 8.15.1-1 all [installed,automatic]
libvips-tools/testing,now 8.15.1-1 amd64 [installed,automatic]
libvips42/testing,now 8.15.1-1 amd64 [installed,automatic]
ruby-vips/testing 2.1.4-1 all
root@9b3e652cc881:/vips# crystal --version
Crystal 1.11.2 [fda656c71] (2024-01-18)

LLVM: 15.0.7
Default target: x86_64-unknown-linux-gnu

But with recent crystal version on Mac M1 it get stuck on the very same sourceCustom test.

While hang and Program exited because of an invalid memory access is two different things, it's interesting they're both in the same place. You tried with different versions of vips?

I have no idea how to debug segfaults. Loading the core dump in gdb doesn't help much:

(gdb) bt
#0  0x0000557c1e797600 in ?? ()
#1  0x00007f6e8ce4dc80 in ?? ()
#2  0x0000000000000000 in ?? ()
naqvis commented 8 months ago

pushed changes and published a new release. Tested these changes on Mac M1 and Ubuntu. CI is green

@xendk Can you please confirm if pushed changes work fine your Debian box. If so please help to mark this issue closed.

xendk commented 8 months ago

Yes and no? It's a new error:


  Vips::SourceCustom
    can create a custom source
    can load a custom source
    can load a source stream
    can create a user output stream
    can write an image to a user output streamCollecting from unknown thread
Program received and didn't handle signal ABRT (6)

My pet project still segfaults. Both dies on write_to_target, the spec writes to a Vips::TargetCustom and my own thing just a regular IO.

But at least I'm getting the same error when running in debian:trixie container now.

naqvis commented 8 months ago

My pet project still segfaults. Both dies on write_to_target, the spec writes to a Vips::TargetCustom and my own thing just a regular IO.

Are you getting segfaults when executing the shards specs (as is)? or its happening when change is made to any of them? Per my knowledge libvips from 8.15 has changed to multi thread writes and this causes any IO writes on on_write event.

So if you might have notice, I have changed connection_spec test case https://github.com/naqvis/crystal-vips/pull/12/files#diff-dace5f01d4de95816ed5ef1ad37992aed96d66280117a013268f3f71c63fb4c6

to use IO::Memory as an intermediary between vips and IOs.

with latest version of vips, you have to follow above approach or best bet would be revert back to libvips version 8.14.2 or bit lower.

xendk commented 8 months ago

Are you getting segfaults when executing the shards specs (as is)?

When running the unmodified specs I get the Collecting from unknown thread Program received and didn't handle signal ABRT (6). Googling suggests this might be just as bad as a segfault (talk about malloc discovering it's internal structures has been munged).

to use IO::Memory as an intermediary between vips and IOs.

Uh, that's bad news, I was rather liking that vips could pipe data rather than keep it all in memory. Isn't there anything crystal-vips can do to "de-thread" the writes, so regular IOs work?

naqvis commented 8 months ago

Isn't there anything crystal-vips can do to "de-thread" the writes, so regular IOs work?

Not I could think of at this stage. Problem is Gobject callbacks are all managed and invoked by the LibVips, so even if we are single threaded, doesn't fix that segfaults. So it might be something to do with how Crystal integrates with C ABI. I did try approaches like thread-safety measures in crystal side, but that's help.

So most safe bet for you to move back to previous version of libvips, where they don't do threading operations on few image formats like jpeg/png.

Though I notice regular IO still works good with format like avif seems libvips isn't doing any threaded operations on this format.

xendk commented 8 months ago

Not I could think of at this stage.

Bummer. I read a bit up on libvips, and I can see how their whole threading model makes things eyewateringly complex.

So most safe bet for you to move back to previous version of libvips, where they don't do threading operations on few image formats like jpeg/png.

Well, apparently Debian testing only has 8.15.1. And I'd rather like to be able to produce a binary that'll be runnable on anything with (a decent version of) libvips.

Though I notice regular IO still works good with format like avif seems libvips isn't doing any threaded operations on this format.

So there's a killswitch, but it's probably not one we can get at.

to use IO::Memory as an intermediary between vips and IOs.

I couldn't get this to work. Shouldn't

        io2 = IO::Memory.new
        image.write_to_target(io2, ".png")

work?

But thinking it through (it's late, so not going so quick), I might be able to live with using write_to_file (which shouldn't require the memory overhead of keeping the entire image in memory), and reading it afterwards.

Anyway, if the way vips processes images is fundamentally incompatible with IO, maybe TargetStream should be removed?

naqvis commented 8 months ago

I couldn't get this to work. Shouldn't

        io2 = IO::Memory.new
        image.write_to_target(io2, ".png")

work?

Above won't work, as LibVips doesn't work with Crystal IO. Instead you can use

target = Vips::Target.new_to_memory
image.write_to_target(target, "%.pngf")
bytes = target.blob # bytes is Slice containing all of the image data and you can go further with creating your IO or write directly to some regular IO

Target.new_to_memory uses the memory managed by LibVips

Or if you prefer to use Crystal IO then you can use TargetCustom and provide your own callbacks for on_write and on_finish.

io = IO::Memory.new
target = Vips::TargetCustom.new

target.on_write { |slice| io.write(slice); slice.size.to_i64 }

target.on_finish do
        io.rewind
       # here you can do anything with this IO like
       # IO.Copy(io, your_destination_io)
 end

image.write_to_target(target, "%.png")

Go with option which suits your needs.

xendk commented 8 months ago

Above won't work, as LibVips doesn't work with Crystal IO.

I don't get it. Image#write_to_target(stream : IO, format : String, **kwargs) just creates a TargetStream from the IO and passes that towrite_to_target. TargetStream seem to me to do the exact same thing as you do in your latter example?

In any case, the original issue has been resolved, so I'll close this.

naqvis commented 8 months ago

I don't get it. Image#write_to_target(stream : IO, format : String, **kwargs) just creates a TargetStream from the IO and passes that towrite_to_target. TargetStream seem to me to do the exact same thing as you do in your latter example?

Aaah I missed that helper method 👍