Closed BilalReffas closed 7 years ago
Sorry, I don't have one :(
sharp is very similar (libvips on node.js), perhaps some of the docs there could help?
I can be a bit more helpful. One you have the data from S3 or wherever as a string, you can decode with new_from_buffer
:
http://www.rubydoc.info/gems/ruby-vips/1.0.3/Vips/Image#new_from_buffer-class_method
You can then process as you please, perhaps with resize
. Once you have a final image, write back to a string with write_to_buffer
:
http://www.rubydoc.info/gems/ruby-vips/1.0.3/Vips/Image#write_to_buffer-instance_method
and then send to S3 or to the client. I'll try to make a small example tonight.
git master libvips has thumbnail_buffer
, which will resize directly from a string to a small image. It's extremely fast and will automatically do things like rendering PDFs at the correct resolution, exploiting shrink-on-load in webp and jpeg, correctly handling premultiplication in transparent images, and so on. That's due out in March.
This would be so nice if you can add a small rails example currently I use paperclip + google cloud storage and for proccesing Imagemagick which is not so fast.. But a s3 example should be okay as well.:)
This is my example model :)
class User
include Mongoid::Document
include Mongoid::Paperclip
include Mongoid::Timestamps
include Mongoid::Search
has_mongoid_attached_file :image,
:styles => { :medium => "350x350"},
:storage => :fog,
:fog_public => true,
:fog_directory => 'profilimages',
:path => "images/:id/:style/:basename.:extension",
:fog_credentials => { :provider => 'Google',
:google_storage_access_key_id => '',
:google_storage_secret_access_key => ''}
validates_attachment_content_type :image, :content_type => ['image/jpg', 'image/jpeg', 'image/png', 'image/gif']
end
Here's a simple example:
#!/usr/bin/env ruby
require 'vips'
byte_string = open(ARGV[0], "rb") {|f| f.read}
# Run the vips sniffer on the string and return the name of the operator that
# vips will use to load the image. You can use this to decide what options (if
# any) to pass the loader. It'll be something like "VipsForeignLoadJpegBuffer".
loader = Vips::Foreign.find_load_buffer byte_string
puts "will load image with #{loader}"
# You can add load options to the hash, or embed them in the "" string, perhaps
# "shrink=4,fail"
# Load options are passed to the loader that vips picks to decode the string.
im = Vips::Image.new_from_buffer byte_string, "", :access => :sequential
# Resize to 400 pixels across.
#
# .resize JUST does the resize, it won't do things like colour management,
# premultiplication of transparency, shrink-on-load, and so on. A proper
# thumbnailer will need to do some more work here. The vips_thumbnail() operator
# is a good place to look for ideas on how to expand this.
#
# ruby-vips with libvips 8.5 has a .thumbnail_buffer operation which does
# everything for you.
scale = 400.0 / im.width
im = im.resize scale
# The string is the suffix for the image format you want to write as. You can
# append options to the suffix, eg. ".jpg[Q=80]", or put them in the end hash,
# eg, :Q => 80
new_byte_string = im.write_to_buffer ".jpg"
open(ARGV[1], "wb") {|f| f.write thumbnail_bytes}
This reads and writes the strings to files, you'd need to hook that up to S3 or whatever.
If you're interested in image resizing, libvips 8.5 should make this much simpler.
This is libvips 8.5 right ? :)
That works with current ruby-vips. 8.5 is getting new thumbnail code, but I've not used it.
@BilalReffas You might like https://github.com/ioquatix/vips-thumbnail
For reference, 8.5 is out now and the equivalent code is:
require 'vips'
# load from S3 perhaps, but here we just read a file in
byte_string = open(ARGV[0], "rb") {|f| f.read}
thumbnail_image = Vips::Image.thumbnail_buffer byte_string, 200
thumbnail_bytes = thumbnail_image.write_to_buffer ".jpg"
open(ARGV[1], "wb") {|f| f.write thumbnail_bytes}
Sumary docs:
http://www.rubydoc.info/gems/ruby-vips/2.0.0/Vips/Image#thumbnail_buffer-class_method
More detail:
http://jcupitt.github.io/libvips/API/current/libvips-resample.html#vips-thumbnail
anyone have a simple-ish way to stream result directly to s3 without writing to local file system first?
@jcupitt your examples seem to read/write the whole thing into RAM first, no?
byte_string = open(ARGV[0], "rb") {|f| f.read}
Okay, now the whole input is in RAM in byte_string, no? And,
thumbnail_bytes = thumbnail_image.write_to_buffer ".jpg"
Now the whole thing is in RAM in thumbnail_bytes. Or do I misunderstand?
Is there a way to do this streaming, without reading/writing the entire contents into RAM at once?
Hello, if you read from a file, it will only read in small chunks.
If you read from memory, it needs to have the whole compressed image available at once. It will only decompress small parts as it needs them, but it does need the whole of the compressed image.
There's a branch that adds true streaming, so you can work directly off a socket (for example), but it's not been merged to master for various reasons. There's a lot of discussion here:
https://github.com/lovell/sharp/issues/30#issuecomment-46960443
I guess you could use named pipes, would that help? It should certainly work for write with most file formats, though perhaps not for read.
OK, read from file to have vips stream, no option for streaming at present except by reading from file on desk. But some possibilities in a branch. If I understand that right.
What about writing? Is there a way to write output in chunks to some destination (like S3 for instance). I think what would be needed is an API that I call, that yields output data in chunks to a block I pass in. Like:
thumbnail_image.write_in_chunks do |chunk|
my_thing.stream_to_wherever_i_want(chunk)
end
The block would be called multiple times in sequence, with chunks of bytes in sequence. Perhaps the API would let met ask for how many bytes in a chunk or something, or perhaps I'd just leave that to vips' judgement. Anything like that available?
(Ruby stdlib/core actually has pretty poor common API/class/duck-type for streaming!).
No, sorry, though as I said you could try using named pipes. Would that work?
For example, on the linux command-line:
$ mkfifo banana
$ vips jpegsave fred.tif banana &
Now banana
is a pipe and vips will write to it in JPEG format. From another program you can now run:
$ cp banana x.jpg
And it will read from the pipe and make a file on disc (though you could send the pipe anywhere). There is no intermediate file, and the intermediate JPG never exists all at once in memory.
You'd need to think of a way to link a named pipe up to an S3 bucket. I'm not sure I can help there, sorry.
Named pipes intro, if you've not come across them before:
Okay. I think allowing actual streaming (processing without loading into RAM, both input and output, from various sources/destination) in a true rubyish way would probably be the 'killer feature' for ruby-vips. (It is of course one of the killer features of vips in general!) Without being able to do that, there isn't really anything I can do with ruby-vips I can't do by shelling out to command line I think.
If you are going to be doing a series of operations, especially with large images, then ruby-vips will be faster than shelling out.
For example, running the vips-bench programs with a 10k x 10k jpg image:
john@mm-jcupitt5 /data/john/pics $ time ./vips.sh wtc.jpg x.jpg
real 0m4.389s
user 0m22.581s
sys 0m0.607s
john@mm-jcupitt5 /data/john/pics $ time ./ruby-vips.rb wtc.jpg x.jpg
real 0m2.701s
user 0m21.404s
sys 0m0.324s
You'll see a larger difference for more complicated programs, plus the ruby-vips version will run without needing lots of disc space for the intermediate files.
I'm not totally following. Can you show me source of your vips.sh vs ruby-vips.rb files you are benchmarking there? That might clear it up.
Does one single command line invocation of vips cli create 'intermediate files'? I thought it did not.
It was just the test programs here:
https://github.com/jcupitt/vips-bench
So vips.sh
is this terrible bash script:
#!/bin/bash
width=$(vipsheader -f Xsize $1)
height=$(vipsheader -f Ysize $1)
width=$((width - 200))
height=$((height - 200))
# set -x
vips crop $1 t1.v 100 100 $width $height
vips reduce t1.v t2.v 1.111 1.111 --kernel linear
cat > mask.con <<EOF
3 3 8 0
-1 -1 -1
-1 16 -1
-1 -1 -1
EOF
vips conv t2.v $2 mask.con --precision integer
rm t1.v t2.v
doing crop / shrink / sharpen with two intermediate files.
Oh, another win for ruby-vips would be if you're processing off S3 and have limited disk space --- it's very common to be allowed 2 or 4gb of ram, but only 512mb of rather slow disc.
If you use ruby-vips you can fetch from S3 to memory, then process to another memory buffer, then write back to S3 all without touching the disc at all, and only needing enough memory for the compressed images. This means you can handle significantly larger files.
OK, i still don't understand how you'd fetch from S3 to memory, "only needing enough memory for the compressed images."
If I understand the examples above, they load the whole original into memory, not just a "compressed image". Or maybe I don't know what "compressed image" means.
What I am impressed with by vips command line is it's ability to handle very large images without needing to keep the whole image in memory. If there's a way to do that with ruby-vips, I don't understand it. I think it would require true streaming support. I'll try to find time to check out that branch. (I'd like to say I'd submit a PR, but I might need to understand the C vips library and C better than I do to do that). For my context, processing multi-hundred-meg TIFFs, anything that requires that whole thing to be in memory is kind of off the table, and a single disk read/write is acceptable, although streaming with neither would be ideal. (ImageMagick can't process my sources at all without giving it huge amounts of RAM; vips cli can).
In your benchmarking examples, I think vips.sh does things in a less efficient way than is possible even with the command line -- I think you could do the crop and reduce in a single command line invocation, eliminating the temporary write to disk? (I'm not sure if cli vips can read/write from stdin/stdout, if so piping two cli vips invocations together would be another way. Or, as you say, named pipes). So I think the comparison may be cli at it's not-best compared to a better use of ruby-vips.
I definitely see the theoretical advantage of ruby-vips, letting you do things way more flexibly and sensibly. But for my uses, I think I'd really only get substantial advantage if it supported true streaming. So I could for instance send the output straight to S3 without it ever touching disk at all, or needing to be all in RAM at all.
A 10k x 10k pixel JPG image is about 300MB when uncompressed (10,000 x 10,000 x 3 bytes), but typically around 15mb when held as a compressed JPG image.
You can fetch the 15mb JPG from S3 and keep it as a ruby string, process to another JPG image held as a string via some sort of pipeline, and write the new image back to S3, all without ever needing 300MB of memory. You'll just need enough for the compressed source and destination, plus a small amount of working space.
The vips command-line can only run a single operation each time. If you want to do three operations A then B then C, you'll need to go out to disk and back between AB and BC. It only supports pipes for write. you can't read from a pipe (since most image format readers will try to seek on files).
I thought of something else: TIFF can't be streamed, unfortunately, you'll find libtiff does a lot of seeking around as it loads. The true seek libvips branch only really supported PNG and JPG.
I get it, yeah, cool. Reading from file is actually working fine for me at the moment, so all is well.
Also, it's interesting you say that TIFF technically can't be streamed -- all I can tell you is RAM usage for creating a JPG thumbnail from an enormous TIFF (read from file via vips cli) is literally an order of magnitude (if not more) less than it was using IM or GM for the same operation. So, I dunno, but I'm happy!
I definitely would not want to read the entire TIFF (whether uncompressed or compressed with LZW or DEFLATE) into RAM. Some of my sources are are hundreds of megs. vips cli does not seem to be requiring those hundreds of megs of RAM; if it is, I'd still rather it be done through vips well-written and optimized C code, than through Garbage-Collected ruby in my controlling ruby process.
Letting the vips command line do a pipeline, like crop then resize, like I think IM can do (?) would probably be a cool improvement.
But yeah, I do see some use cases for multi operations then where ruby-vips might be an advantage. For what I'm actually doing right now, everything is actually working great with cli.
Also, it's interesting you say that TIFF technically can't be streamed -- all I can tell you is RAM usage for creating a JPG thumbnail from an enormous TIFF (read from file via vips cli) is literally an order of magnitude (if not more) less than it was using IM or GM for the same operation. So, I dunno, but I'm happy!
From what I know, IM image buffers are pretty inefficient, I think last time I looked it used floating point pixels for the internal format?
IM6 used 16-bit ints for pixel values by default, IM7 has switched to float everywhere, so it needs about twice as much memory.
A format is streamable if you can process it by just reading bytes sequentially from a pipe. JPG, PNG and some others are designed to work like this. it's an important property for the web, where you want to start displaying an image before you've transferred the whole thing.
TIFF is an older format and you need to be able to jump about in the image to decode it. Pipes don't support seek() operations, so you can't read TIFF from a pipe, except by copying the whole thing to disc or memory, and you can't start processing pixels until you've downloaded the whole file.
In case anyone comes across this old thread, libvips 8.9 now has true streaming. You can now (for example) read from one pipe and write to another without ever having everything in memory.
Summary notes and Ruby example here:
https://libvips.github.io/libvips/2019/12/11/What's-new-in-8.9.html#true-streaming
require 'vips'
source = File.open "some/file/name", "rb"
input_stream = Vips::Streamiu.new
input_stream.on_read { |length| source.read length }
dest = File.open ARGV[1], "w"
output_stream = Vips::Streamou.new
output_stream.on_write { |chunk| dest.write(chunk) }
output_stream.on_finish { dest.close }
thumb = Vips::Image.thumbnail_stream input_stream, 128
thumb.write_to_stream output_stream, ".png"
What are Streamiu
and Streamou
?
I jumped the gun, 8.9 is still not quite out ;( another day or two. Streamiu
etc. are in ruby-vips master and will be updated with 8.9.
Yes, but what are they? They are not obvious from their names, which, IMHO, are a bit odd. Great feature, but can we make the class names a little bit more obvious?
Sure, not too late to change (just). We've been through a couple of names already :( It's stream / input / user, so VipsStreamiu
, or Streamiu
in Ruby and Python.
The blog post introducing the feature has some background:
https://libvips.github.io/libvips/2019/11/29/True-streaming-for-libvips.html
The user/developer centric name should be InputStream
and OutputStream
. If you need to add the word user, InputUserStream
and OutputUserStream
. However, I think User
as a qualifier is redundant since it's obviously for the user...
In some ways it's misleading to use the word User
since what you probably mean is "provided by the external system interfacing with Vips" which isn't necessarily a user, but perhaps ExternalInputStream
and ExternalOutputStream
make more sense, if you already have concepts for Internal*Stream
s.
Probaly the best place to debate a name change would be the release issue: https://github.com/libvips/libvips/issues/1494
I'll @ you over there.
Is there maybe a example for Rails using S3 or Google Cloud Storage. Btw thanks for this really fast processor! :)