Open jcupitt opened 4 years ago
I would be very interested in this. I am using image_processing
with shrine
and direct uploads, so my workflow is: Google Cloud Storage bucket (= image from URL) => image processing with libvips => multiple outputs to Google Cloud Storage (= HTTP requests).
I guess this would need multiple changes from this gem as well as Shrine & Shrine storages, but I think the first step would be to see how much efficient (time spend, memory usage…) this would be.
Yes, I wonder how helpful it will be in practice.
Uploads have to give content-length, so you need to finish the result before you can start writing. If the upload is >5MB you can chunk it, but I think most uploads will be smaller than that.
We need benchmarks!
I added custom input and output streams to ruby-vips and merged to master.
You can now write this, for example:
class Mystreami < Vips::Streamiu
def initialize(filename)
@filename = filename
@contents = File.open(@filename, "rb").read
@length = @contents.length
@read_point = 0
super()
signal_connect "read" do |buf, len|
bytes_available = @length - @read_point
bytes_to_copy = [bytes_available, len].min
buf.put_bytes(0, @contents, @read_point, bytes_to_copy)
@read_point += bytes_to_copy
bytes_to_copy
end
end
signal_connect
lets you attach a handler that does the read
.
You can use it like this:
input_stream = Mystreami.new("some/file")
image = Vips::Image.thumbnail_stream input_stream, 128
There's a more complete example here, with output as well, and implementing seek:
https://github.com/libvips/ruby-vips/blob/master/example/stream.rb
It should be easy to hook it up to something that does read/write to URIs.
@jcupitt Thanks a lot for working on this! It definitely sounds exciting, and we might add support for it in the near future.
If I would want to feed data from an IO-like object that already implements IO#read
and IO#seek
methods into the input stream, and retrieve chunks of processed file data from the output stream, would something like the following work?
source_io #=> some kind of IO-like object
input_stream = Vips::Streamiu.new
input_stream.signal_connect "read" do |buf, len|
chunk = source_io.read(len)
buf.put_bytes(0, chunk, 0, chunk.bytesize)
chunk.bytesize
end
input_stream.signal_connect "seek" do |offset, whence|
source_io.seek(offset, whence)
end
image = Vips::Image.new_from_stream input_stream, "", access: "sequential"
destination_io #=> some kind of IO-like object
output_stream = Vips::Streamou.new
output_stream.signal_connect "write" do |buf, len|
chunk = buf.get_bytes(0, len)
destination_io.write(chunk)
chunk.bytesize
end
output_stream.signal_connect "finish" do
destination_io.close
end
image.write_to_stream(output_stream, ".png")
For Shrine it would be a very common use case. In that sense I think it would be very beneficial to simplify this in ruby-vips, for example automatically reading/writing from/to the buffer, so that one could do something like this:
input_stream = Vips::Streamiu.new
input_stream.on_read { |length| source.read(length) } # buffer writing would happen under-the-hood
input_stream.on_seek { |offset, whence| source.seek(offset, whence) }
image = Vips::Image.new_from_stream input_stream, "", access: "sequential"
output_stream = Vips::Streamou.new
output_stream.on_write { |chunk| dest.write(chunk) } # chunk is automatically read from the buffer
output_stream.on_finish { dest.close }
image.write_to_stream output_stream, ".png"
I'm not suggesting this should be the API, just that eventually having something like this in ruby-vips might lower the barrier entry for using streams significantly. But until the initial streams release in libvips I think the current API would suffice, just wanted to put the idea out there 😃
Hiya, yes, your first example should work, from a quick look.
Yes, you need to use FFI get_bytes
and put_bytes
at the moment. It's ugly, but I think it's necessary if you want to avoid a copy.
It's true, we could add a simple no-FFI layer over it. I think you'd just need:
module Vips
class Streamiu
def on_read length, &block
signal_connect "read" do |buf, len|
chunk = block.call(length)
buf.put_bytes(0, chunk, 0, chunk.bytesize)
chunk.bytesize
end
end
end
end
Or would you need to add a Proc
to capture the outer block? I always get confused.
Thanks a lot for the review. Yes, your example would work. When libvips 8.9 is released and when we start working on adding the streaming support to ImageProcessing, I can help with adding the extra convenience layer to ruby-vips.
I've one more question. When we pass the content string to put_bytes
(2nd argument), does libvips take a copy of it, making it safe to deallocate that string afterwards on the Ruby side (e.g. using chunk.clear
)? Same question for chunks retrieved from get_bytes
.
I would like to make this streaming memory-efficient by deallocating all content chunks we've allocated for sending to libvips. If this works, then I believe we have everything we need for adding support to ImageProcessing 💪
put_bytes
is just a thin FFI layer over the C memcpy
-- there's no memory alloc or free there.
get_bytes
takes a copy, unfortunately, so you're right, we might need to be careful if we want to keep good GC behaviour.
VipsStreamo
does simple write buffering, so generally write
will be called on 8kb chunks of data (rather than 1000s of tiny writes), which should make it simpler.
I added an IO::-style layer, so this now runs:
#!/usr/bin/ruby
require 'vips'
source = File.open ARGV[0], "rb"
input_stream = Vips::Streamiu.new
input_stream.on_read { |length| source.read length }
input_stream.on_seek { |offset, whence| source.seek(offset, whence) }
dest = File.open ARGV[1], "w"
output_stream = Vips::Streamou.new
output_stream.on_write { |chunk| dest.write(chunk) }
output_stream.on_finish { dest.close }
image = Vips::Image.new_from_stream input_stream, "", access: "sequential"
image.write_to_stream output_stream, ".png"
Eg.:
./stream.rb ~/pics/k2.jpg x.png
Thanks for the idea!
I added an IO::-style layer, so this now runs:
That's awesome! Thanks for adding it so quickly 😃
get_bytes
takes a copy, unfortunately, so you're right, we might need to be careful if we want to keep good GC behaviour.
Did you mean "returns a copy"? I think that totally makes sense, as long as one can deallocate that string once writing it to the destination.
output_stream.on_write do |chunk|
dest.write(chunk)
chunk.clear
end
As I understood, the above isn't expected to raise any error within libvips, so I think we're all good 😉
Sorry for not checking these things myself, I haven't gotten to installing libvips master on my Mac yet. I will try to figure out whether I can get Homebrew to install it, so that I can do it in a single command.
I build on mac by installing their vips to pull in all the dependencies, then removing it again and doing the usual git clone && ./autogen.sh --prefix=/Users/john/vips && make && make install
etc.
I added your chunk.clear trick, and something to stop exceptions being thrown across FFI boundaries.
Hello all,
git master libvips has a new feature which might be relevant to the
image_processing
gem: true streaming. There's a short note about it here:https://libvips.github.io/libvips/2019/11/29/True-streaming-for-libvips.html
You can now connect libvips image processing pipelines directly to any source or destination and it'll process data in a fully streaming manner. For example, you could fetch bytes from one URI (perhaps an S3 bucket), process, and write to another URI (perhaps a different bucket) in parallel and with no buffering.
We don't have any benchmarks yet, but we hope that this will produce a useful drop in latency. It should be fun to experiment with anyway. I don't think (as far as I know) anyone has done this before.
This branch of
ruby-vips
adds support:https://github.com/libvips/ruby-vips/tree/add-stream
You can see examples of the proposed API here:
https://github.com/libvips/ruby-vips/blob/add-stream/spec/stream_spec.rb
Sample:
So that'll stream pixels between file descriptors 12 and 13 (perhaps they are pipes connected to S3 URIs), reading any supported image format, and writing as PNG.
You can do anything in the pipeline (blur, rotate, composite, etc.). There's
thumbnail_stream
which can efficiently thumbnail images directly from a stream source. You can also open streams multiple times, examine the header and adjust parameters without actually processing pixels.Questions!
image_processing
hook up to storage systems like S3 at the moment?