libvips / php-vips

php binding for libvips
MIT License
615 stars 25 forks source link

feat: Implement true streaming and signal handling #191

Closed L3tum closed 1 year ago

L3tum commented 1 year ago

Hey @jcupitt , I thought I'd give this a try. Let me know what you think.

I had a few issues that I'd ideally like you to look at. I stuck pretty closely to pyvips which may not have resulted in the best API for PHP.

There's still some stuff to do because this is just a first draft. In particular:

I'm actually a bit surprised since the code seems to be more readable than pyvips when interacting with FFI.

(Ref #101 )

jcupitt commented 1 year ago

Hey, this is very cool @L3tum! I'll try to take a look.

@kleisauke @chregu you might be interested too.

L3tum commented 1 year ago

Hey @jcupitt , thank you!

I've added a few tests for the behaviour, and so far the built-in Sources and Targets seem to work.

Unfortunately, I also stumbled upon a pretty bad roadblock. Apparently the PHP FFI Callback Magic checks the number of arguments that the C callback type expects and, if they are less than the number of arguments the PHP function expects, rejects it.

(This is if I understand the PHP source correctly. You can take a look here, though fair warning, the file is like 8000 lines long and takes a while to load).

Because GCallback technically does not have any arguments, it rejects the callbacks provided by vips. Casting the PHP-callback to a GCallback like in pyvips just breaks the FFI parser.

I need to have a think how we could solve this. If you have any suggestions, please fire away. I'm not very knowledgable in C-callback syntax or behaviour.

After reading a bit it seems like g_signal_connect_closure is the relevant thing here.

In the case of C programs, a closure usually just holds a pointer to a function and maybe a data argument, and the marshaller converts between GValue and native C types. The GObject library provides the GCClosure type for this purpose. Bindings for other languages need marshallers which convert between GValues and suitable representations in the runtime of the language in order to use functions written in that language as callbacks. Use g_closure_set_marshal() to set the marshaller on such a custom closure implementation.

So setting a custom marshaller for a "GPHPClosure" would probably make the PHP FFI happy? Otherwise, would it be possible to add some glue-code to libvips itself for the different signals?

L3tum commented 1 year ago

So the optional-argument hack does not work, unfortunately.

Scratch THAT Scratch what I said about the pleasantness of working with PHP FFI 😆 So far the FFI accepts the mismatch in the number of arguments if I make them optional. However, it does not support passing a closure to a `void*` datatype. Instead, (I think) I need to create the C-type for these callbacks, then create a C-array via `FFI::new`, store those arrays to clean them up later, and pass the address of that array to `g_signal_connect_data`. Alternatively, since these are PHP Closures and PHP FFI seems to be a *little* more involved than Python FFI, I think I could also just `use` the callback and save them inside the Closure. I need to check whether that would leak memory (it shouldn't), but seems like a better idea than the array monstrosity above. It would allocate more functions than the current (and Python) implementation would, but at least it would work. Of course, this is all under the assumption that the optional-argument hack actually gets me the arguments and doesn't just ignore them. I hope my next commit will be the final working commit^^
jcupitt commented 1 year ago

Oh huh now you say, I half-remember looking at php-ffi callbacks and backing away in horror. They are pretty ugly in pyvips too.

I'll try to find time to experiment as well.

L3tum commented 1 year ago

I think I've managed to make it work with Closures. It's actually not that bad, although I'd still have hoped for PHP FFI to provide some native way of exposing a function with a defined set of types, regardless of what the C-type definition does.

I'll need to do the other Callbacks as well and then test the whole thing again especially in regards to pointers and memory leaks (I'd guess right now the GClosure is always leaked but ¯\_(ツ)_/¯ ) and figure out the right size of a GClosure cause FFI::sizeof does not work (likely because of internal fields not in the definition/documentation).

$marshalers['write'] = static function (CData $gClosure, CData $returnValue, int $numberOfParams, CData $params, CData $hint, ?CData $data) use ($callback): void {
   assert($numberOfParams === 4);
   $bufferPointer = FFI::gobject()->g_value_get_pointer(\FFI::addr($params[1]));
   $bufferLength = (int) FFI::gobject()->g_value_get_int64(\FFI::addr($params[2]));
   $buffer = \FFI::string($bufferPointer, $bufferLength);
   FFI::gobject()->g_value_set_int64($returnValue, $callback($buffer));
};
[...]
$go = \FFI::cast(FFI::ctypes('GObject'), $this->pointer);
$gc = FFI::gobject()->g_closure_new_simple(64, null);
$gc->marshal = $marshalers[$name];
FFI::gobject()->g_signal_connect_closure($go, $name, $gc, 0);
L3tum commented 1 year ago

Okay @jcupitt, I need some feedback again (sorry, I tend to be pretty spammy when it comes to PRs^^).

The code works as it is now, but it's not optimal. i tried to stick to the pyvips interface. However, because PHP has no Buffer or equivalent, I've opted to use string, like the other buffer APIs do. Because the underying string does not seem to be shared, there is now a need to copy the buffer over one more time than necessary for the read callback. This is obviously not ideal. I could change this, but it would also change the callback API from receiving a buffer and returning the length written, to instead receive the length to be written and return a buffer, which is then strlen by the "marshaler" in GObject.

The second API change I'd like to make would be to make VipsSourceCustom and VipsTargetCustom abstract and instead have the callbacks as abstract functions that need to be extended (with a small wrapper around them in the *Custom classes to do the actual signal connection).

Let me know what you think about those two changes, and the PR in general. I still need to add the docs to it but I consider that a "nonfunctional" (albeit still important) thing.

jcupitt commented 1 year ago

Great! I have a big deadline on Thursday, but I should have some time on Friday to look at this.

L3tum commented 1 year ago

I've run the AutoDoc generator for the source methods as well now and added a (basic and ugly) benchmark. The results seem promising already. Apparently the overhead for calling PHP callbacks from C is less than the equivalent in Python (from the benchmarks I found online). I'd guess with the proposal to change the read API (or waiting for the implementation of substrings in PHP 😆 ) it would be an even greater lead. Seems like it's less of an issue than I thought though.

$ php examples/streaming-bench.php 
1.9231050014496 Seconds for Streaming with callbacks
1.7192220687866 Seconds for Streaming with builtin source/target
1.1590650081635 Seconds for Streaming Thumbnail with callbacks
1.1540429592133 Seconds for Streaming Thumbnail with builtin source/target
1.3935730457306 Seconds for Thumbnail API
1.9780020713806 Seconds for Classic API
jcupitt commented 1 year ago

Wow, those benchmarks are very encouraging, nice job!

L3tum commented 1 year ago

Since my work mostly employs 4k images I've done some benchmarks with those and found out that PHP Callbacks do seem to carry a significant overhead. I guess smaller images don't require as many calls and thus the speedup by the streaming hides the extra latency.

Moving to 4k gives this result for the current implementation:

$ php examples/streaming-bench.php 
9.4509589672089 Seconds for Streaming with callbacks
7.8189198970795 Seconds for Streaming with builtin source/target
6.7259550094604 Seconds for Streaming Thumbnail with callbacks
5.5654470920563 Seconds for Streaming Thumbnail with builtin source/target
5.910943031311 Seconds for Thumbnail API
7.9930689334869 Seconds for Classic API
154.6068239212 Seconds for Thumbnail API Remote Source
131.57986712456 Seconds for Streaming Thumbnail API Remote Source

As a test I've implemented the API change for the read callback I proposed locally and got these numbers:

$ php examples/streaming-bench.php 
9.1749761104584 Seconds for Streaming with callbacks
7.8914790153503 Seconds for Streaming with builtin source/target
6.388160943985 Seconds for Streaming Thumbnail with callbacks
5.5694668292999 Seconds for Streaming Thumbnail with builtin source/target
5.9077990055084 Seconds for Thumbnail API
8.0515778064728 Seconds for Classic API
167.04970622063 Seconds for Thumbnail API Remote Source
136.30600118637 Seconds for Streaming Thumbnail API Remote Source

(Note that any benchmark without callbacks exposes run-to-run variance and can serve as a check).

A couple of things I've noticed:

I'll focus on my actual dayjob now to give you some time to review this PR^^ If I get some freetime I'll give it a think if we could further reduce the callback overhead. I'm not sure if there's an expert in the libvips team for the GObject stuff, but if there is it'd be nice if they could also give it a lookover to see if there's some potential there.

L3tum commented 1 year ago

I've done some cleanup on GObject and added the last few docs that were missing. From my POV this PR is now good to go.

jcupitt commented 1 year ago

Hi @L3tum, ooooof, sorry for the delay, I had a huge deadline for 31 March and then a small family holiday. I'll try to go over this PR this weekend. Thanks again!

jcupitt commented 1 year ago

I tried removing the GClosure:

https://github.com/libvips/php-vips/tree/callback-experiment

Here's a test prog:

#!/usr/bin/php
<?php

require __DIR__ . '/vendor/autoload.php';
use Jcupitt\Vips;

function print_progress($progress)
{
    echo "progress:\n";
    echo "  progress->run = $progress->run\n";
    echo "  progress->eta = $progress->eta\n";
    echo "  progress->tpels = $progress->tpels\n";
    echo "  progress->npels = $progress->npels\n";
    echo "  progress->percent = $progress->percent\n";
}   

$image = Vips\Image::black(1, 100000);
$image->setProgress(true);

$image->signalConnect("preeval", function($image, $progress) {   
    echo "preeval:\n";
    echo "  image = $image\n";
    print_progress($progress);
});
$image->signalConnect("eval", function($image, $progress) {     
    echo "eval:\n";
    echo "  image = $image\n";
    print_progress($progress);
});

$image->signalConnect("posteval", function($image, $progress) {   
    echo "posteval:\n";
    echo "  image = $image\n";
    print_progress($progress);
});

// trigger evaluation
$image->avg();

I see:

$ ./progress2.php
preeval:
  image = {"width":1,"height":100000,"bands":1,"format":"uchar","interpretation"
:"multiband"}
progress:
  progress->run = 0
  progress->eta = 0
  progress->tpels = 100000
  progress->npels = 0
  progress->percent = 0
eval:
  image = {"width":1,"height":100000,"bands":1,"format":"uchar","interpretation"
:"multiband"}
progress:
  progress->run = 0
  progress->eta = 0
  progress->tpels = 100000
  progress->npels = 192
  progress->percent = 0
.... snip ....
eval:
  image = {"width":1,"height":100000,"bands":1,"format":"uchar","interpretation":"multiband"}
progress:
  progress->run = 0
  progress->eta = 0
  progress->tpels = 100000
  progress->npels = 12512
  progress->percent = 12
posteval:
  image = {"width":1,"height":100000,"bands":1,"format":"uchar","interpretation":"multiband"}
progress:
  progress->run = 0
  progress->eta = 0
  progress->tpels = 100000
  progress->npels = 100000
  progress->percent = 100

So it seems to work -- it's getting the progress pointer, and you can see the % complete ticking up to 100.

What do you think? It's a bit cleaner without the GClosure anyway, and should be slightly quicker.

I only implemented the progress signals, we'd need to add more cases to buildMarshal() for read / write / seek / etc.

jcupitt commented 1 year ago

Oh sigh I tried to add read() and ran into difficulties. Let me try again.

jcupitt commented 1 year ago

I spent a bit more time looking into this and I think you're right -- your solution is the best one possible with php-ffi.

I think we should probably merge this PR and fix any nits in a follow-up. What do you think?

jcupitt commented 1 year ago

FWIW, I made a test prog for progress reporting:

#!/usr/bin/php 
<?php

require __DIR__ . '/../vendor/autoload.php';
use Jcupitt\Vips;

#Vips\Config::setLogger(new Vips\DebugLogger());

$image = Vips\Image::black(1, 1000000);
$image->setProgress(true);

$image->signalConnect("preeval", function($image, $progress) {
    echo "preeval:\n";
});
$image->signalConnect("eval", function($image, $progress) {
    echo "eval: $progress->percent % complete\r";
}); 

$image->signalConnect("posteval", function($image, $progress) {
    echo "\nposteval:\n";
});

// trigger evaluation
$image->avg();

$image = null;

Vips\FFI::shutDown();

Then:

$ VIPS_LEAK=1 ./progress.php 
preeval:
eval: 100 % complete
posteval:
vips_threadset_free: peak of 17 threads
memory: high-water mark 768 bytes

So no leaks and it seems to work well. I had to add setProgress, but it's a one-liner.

jcupitt commented 1 year ago

I made a custom source/target copy as well:

#!/usr/bin/env php
<?php

require dirname(__DIR__) . '/vendor/autoload.php';

use Jcupitt\Vips;

if (count($argv) != 4) {
    echo "usage: $argv[0] IN-FILE OUT-FILE FORMAT\n";
    echo "  eg.: $argv[0] ~/pics/k2.jpg x.tif .tif[tile,pyramid]\n";
    exit(1);
}

$in_file = fopen($argv[1], 'r');
$source = new Vips\VipsSourceCustom();
$source->onRead(function ($bufferLength) use (&$in_file) {
    // return 0 for EOF, -ve for read error
    return fread($in_file, $bufferLength);
});
// seek is optional
$source->onSeek(function ($offset, $whence) use (&$in_file) {
    if (fseek($in_file, $offset, $whence)) {
        return -1;
    }

    return ftell($in_file);
});

// open for write and read ... formats like tiff need to be able to seek back
// in the output and update bytes later 
$out_file = fopen($argv[2], 'w+');
$target = new Vips\VipsTargetCustom();
$target->onWrite(function ($buffer) use (&$out_file) {
    $result = fwrite($out_file, $buffer);
    if ($result === false) {
        // IO error
        return -1;
    }
    else
        return $result;
});
// read and seek are optional
$target->onSeek(function ($offset, $whence) use (&$out_file) {
    if (fseek($out_file, $offset, $whence)) {
        return -1;
    }

    return ftell($out_file);
});
$target->onRead(function ($bufferLength) use (&$out_file) {
    return fread($out_file, $bufferLength);
});

$image = Vips\Image::newFromSource($source);
$image->writeToTarget($target, $argv[3]);

Then eg.:

$ ./streaming-custom.php ~/pics/k2.jpg x.tif .tif[tile,pyramid] 

Writes an image pyramid correctly, so the seek and read handler on target are working too.

L3tum commented 1 year ago

Hi @L3tum, ooooof, sorry for the delay, I had a huge deadline for 31 March and then a small family holiday. I'll try to go over this PR this weekend. Thanks again!

No worries, we're all here in our free time :)

I spent a bit more time looking into this and I think you're right -- your solution is the best one possible with php-ffi.

I think we should probably merge this PR and fix any nits in a follow-up. What do you think?

Sounds good to me. I've seen that you experimented a bit, but likely got to the same issues as me, that callbacks with different signatures confuse PHP FFI.

If there's nothing blocking this PR then I'd do a followup when I got time to fix the stuff you mentioned (and possibly experiment with your attempt as well, looks more promising than my attempt at it).

adoy commented 1 year ago

Hi @L3tum @jcupitt. Good job for this awesome PR.

I think we should probably merge this PR and fix any nits in a follow-up. What do you think?

Any news on merging this ?

jcupitt commented 1 year ago

Sorry, I've been sitting on this :( I'm trapped in an awful deadline hell on yet another project. I'll try to look at this again towards the end of next week.

adoy commented 1 year ago

Sorry, I've been sitting on this :( I'm trapped in an awful deadline hell on yet another project. I'll try to look at this again towards the end of next week.

No worry :) Thanks for you hard work.

L3tum commented 1 year ago

@jcupitt I've pushed some commits fixing the various notes/issues you noted. In particular I've removed the GClosure FFI specification and replaced the ->marshal setter with the g_closure_set_marshal function. It's probably worse for performance, but I don't think it matters. One thing to note is that I "hardcoded" the GClosure size to what I thought it would be, and it was larger. I'm semi-sure it's because of padding because the extra 4 bytes push it to a neat 32-bytes rather than 28-bytes. I'd guess the padding is less on 32-bit systems. The particular worry I have is other architectures like ARM or more exotic ones. I don't have any way to test them either. But as long as we give a higher sizeof in than gobject expects it will allocate the GClosure regardless. This is (IMO) a really bad API by them and makes us not as efficient on systems with less padding, but I don't think that should really be a concern.

jcupitt commented 1 year ago

Hello again, I finished that project, phew. I'll look through this PR again now.

jcupitt commented 1 year ago

I think it's great!

Let's merge, close this PR, and do any further polish and testing in a few follow-up commits.

jcupitt commented 1 year ago

... and thank you very much for doing all this work @L3tum. Nice job.

adoy commented 1 year ago

Nice job to both of you. I have some needs for this feature. I'll try to do some tests this week.

jcupitt commented 1 year ago

I added the examples from this PR, updated the CHANGELOG, updated the enums, and fixed a couple of tiny things.

Let's kick the tyres for a week or two, then make an official release.