Closed Nakilon closed 6 years ago
Is my only option to always map a .first
to each pixel after .to_a
or even use LHC or smth and then map it too?
Or is it really a bug? That docs line is missleading me.
Hello @Nakilon,
:b_w
means some sort of greyscale image, but it can still have many bands. It's a hint about how the image should be shown to the user rather than solid information about the number of channels. For example, you could have an extra band of alpha.
I tried with your image and it comes out as :srgb
for me.
irb(main):001:0> require 'vips'
=> true
irb(main):002:0> x = Vips::Image.new_from_file "temp.png"
=> #<Image 1703x896 uchar, 4 bands, srgb>
irb(main):003:0> x.interpretation
=> :srgb
If you load a greyscale PNG with alpha, you'll get a two-band :b_w
image, with the alpha as band 2. You can use .flatten
to flatten out any alpha before calling .to_a
, or extract just the band you want.
Could you give more detail about the problem you have run into?
I took about 1000 images from a social network and fingerprinted them with my idhash thing. It is doing something like this:
image = Vips::Image.new_from_file filename
image = image.resize(8.fdiv(image.width), vscale: 8.fdiv(image.height)).colourspace("b-w")
array = image.to_a.map &:flatten
...
The produced fingerprint of a one image of 1000 appeared to be larger than 32 bytes and it was caught by an assert during fingerprint comparing.
I now wonder what will be faster:
1) .resize.colourspace("b-w").to_a.map{map &:first}
2) .flatten.resize.colourspace("b-w").to_a.map &:flatten
3) .resize.flatten.colourspace("b-w").to_a.map &:flatten
4) .resize.colourspace("b-w").flatten.to_a.map &:flatten
Also I could apply colourspace before resizing but that probably would even change the result a bit (and I guess I tried this before and it didn't work well).
My benchmark did not notice a difference between 1 and 4.
2 and 3 raise:
Vips::Error: vips_colourspace: no known route from 'multiband' to 'b-w'
(off-topic)
Yesterday, investigating huge memory consumption (it eats all the available 1.5gb of RAM and only then after ~100 images seems to start GC)
1%, 166mb RAM used, 22% RAM free
2%, 277mb RAM used, 20% RAM free
3%, 351mb RAM used, 18% RAM free
4%, 527mb RAM used, 15% RAM free
5%, 609mb RAM used, 14% RAM free
6%, 670mb RAM used, 13% RAM free
7%, 861mb RAM used, 10% RAM free
8%, 948mb RAM used, 9% RAM free
9%, 1041mb RAM used, 7% RAM free
10%, 1414mb RAM used, 2% RAM free
11%, 1609mb RAM used, 0% RAM free
12%, 1055mb RAM used, 0% RAM free
13%, 945mb RAM used, 2% RAM free
14%, 643mb RAM used, 2% RAM free
15%, 796mb RAM used, 4% RAM free
16%, 936mb RAM used, 3% RAM free
17%, 1054mb RAM used, 6% RAM free
18%, 1115mb RAM used, 7% RAM free
19%, 1108mb RAM used, 7% RAM free
20%, 1094mb RAM used, 7% RAM free
21%, 1087mb RAM used, 7% RAM free
22%, 1086mb RAM used, 6% RAM free
...
on fingerprinting these 1000 images, I've noticed that if I randomly interrupt the program it always stops inside AutoPointer.new
(for some reason at the line about multiple calls of ptr.kind_of
) so I assume this constructor to be slow. What is an AutoPointer
? Does it differ from the Pointer
only in the way that you can pass a destructor? I tried to rewrite the write_to_memory
method from
ptr = FFI::AutoPointer.new(ptr, GLib::G_FREE)
ptr.get_bytes 0, len[:value]
to
ptr = FFI::Pointer.new ptr
ptr.get_bytes(0, len[:value]).tap do
GLib::g_free ptr
end
Not sure if it's correct and won't segfault but interrupted program began to stop in more random places. Speed or memory consumption didn't get visibly better though.
I think I would do:
image = Vips::Image.thumbnail filename, 8, height: 8, size: :force
image = image.colourspace("b-w")
image = image.flatten if image.bands > 1
array = image.to_a
I remember you preferred resize
to thumbnail
, but there's a huge speed and memory advantage with thumbnail
, so I'd still be temped to use that if possible. The flatten
will multiply any alpha into band 0.
This won't work for CMYK images, but I don't know how common they are in your case. You'd need to handle them via an ICC transform.
On memory use, have you turned off the libvips cache? It won't help your style of batch processing and will use memory.
Add Vips::cache_set_max(0)
somewhere near the start of your program.
Yes, AutoPointer
is an FFI class that lets you attach a destructor to a pointer. You can give it a pointer to a C function and it can call it directly rather than going back into Ruby again, which is useful.
Vips::Error: ruby-vips: enum 'VipsSize' has no member 'force', should be one of: both, up, down
cache_set_max
did not help -- still reaching 1000mb in about 10 seconds:
1%, 166mb RAM used, 22% RAM free
2%, 278mb RAM used, 20% RAM free
3%, 352mb RAM used, 18% RAM free
4%, 525mb RAM used, 16% RAM free
5%, 663mb RAM used, 14% RAM free
6%, 667mb RAM used, 14% RAM free
7%, 857mb RAM used, 11% RAM free
8%, 944mb RAM used, 10% RAM free
9%, 1037mb RAM used, 8% RAM free
10%, 1410mb RAM used, 3% RAM free
11%, 1607mb RAM used, 0% RAM free
12%, 1673mb RAM used, 0% RAM free
13%, 1563mb RAM used, 1% RAM free
force
was added in 8.6, I guess you have an older libvips.
I'll see if I can make a small example that has the memory problem.
Argh stupid trackpad, sorry.
I tried this:
ARGV.each do |filename|
puts "#{filename} = #{DHashVips::DHash::calculate filename}"
end
And tested with:
$ mkdir samples
$ for i in {1..1000}; do cp ~/pics/k2.jpg samples/$i.jpg; done
$ time ./soak-dhash.rb samples/*
real 0m42.568s
user 0m42.942s
sys 0m4.054s
Watching in top, it runs in a fairly steady 340mb of memory. k2.jpg
is a 1400 x 2048 pixel RGB jpg.
If I change pixelate
to be:
def pixelate file, hash_size, kernel = nil
image = Vips::Image.thumbnail file, hash_size + 1, height: hash_size, size: :force
image.colourspace("b-w")
end
It runs in 20s and 50mb of memory. If I add Vips::cache_set_max 0
just before the loop, it runs in 20s still, but 34mb of memory.
... anyway, I don't see uncontrolled memory growth with dhash, so I guess the problem is in idhash. Maybe the to_a
is not releasing memory?
Switching to thumbnail
should give you a nice drop in memory use and a speedup. Turning off the cache should get memory use down further.
Upgraded vips.
I'm trying to make minimal leaking code example and that's insane -- it leaks when I uncomment the line
YAML.load File.read "../labeled.yaml"
even if the loaded data (750kb) is not used.
Without that line:
1%, 261mb RAM used, 23% RAM free
2%, 442mb RAM used, 20% RAM free
3%, 575mb RAM used, 18% RAM free
4%, 686mb RAM used, 16% RAM free
5%, 584mb RAM used, 18% RAM free
6%, 532mb RAM used, 18% RAM free
7%, 489mb RAM used, 19% RAM free
8%, 494mb RAM used, 19% RAM free
9%, 361mb RAM used, 21% RAM free
10%, 359mb RAM used, 21% RAM free
11%, 347mb RAM used, 21% RAM free
12%, 382mb RAM used, 21% RAM free
13%, 419mb RAM used, 20% RAM free
14%, 503mb RAM used, 19% RAM free
15%, 549mb RAM used, 18% RAM free
16%, 414mb RAM used, 20% RAM free
With:
1%, 275mb RAM used, 18% RAM free
2%, 450mb RAM used, 15% RAM free
3%, 585mb RAM used, 12% RAM free
4%, 697mb RAM used, 10% RAM free
5%, 845mb RAM used, 8% RAM free
6%, 963mb RAM used, 6% RAM free
7%, 1040mb RAM used, 5% RAM free
8%, 1093mb RAM used, 4% RAM free
9%, 1102mb RAM used, 4% RAM free
10%, 1096mb RAM used, 4% RAM free
11%, 1056mb RAM used, 5% RAM free
12%, 1178mb RAM used, 3% RAM free
13%, 1323mb RAM used, 0% RAM free
14%, 1476mb RAM used, 0% RAM free
15%, 1564mb RAM used, 0% RAM free
Like the yaml stdlib breaks GC. Gonna ask somewhere.
UPD: hm, 2012 issue mentioning Ruby 1.9.3 Seems like my leak is not directly related to vips but GC stops working well during vips (ffi?) manipulations.
UPD2: in 2012 it could be patched by require "psych"
but currently it does nothing and didn't help me even after I upgraded it from 2.1.0 to 3.0.2.
Just calling GC.start
after each .fingerprint
helps:
1%, 175mb RAM used, 13% RAM free
2%, 229mb RAM used, 12% RAM free
3%, 231mb RAM used, 12% RAM free
4%, 253mb RAM used, 12% RAM free
5%, 260mb RAM used, 12% RAM free
6%, 233mb RAM used, 12% RAM free
7%, 221mb RAM used, 13% RAM free
8%, 260mb RAM used, 12% RAM free
9%, 248mb RAM used, 12% RAM free
10%, 234mb RAM used, 12% RAM free
11%, 228mb RAM used, 12% RAM free
12%, 181mb RAM used, 13% RAM free
13%, 217mb RAM used, 13% RAM free
14%, 209mb RAM used, 13% RAM free
15%, 245mb RAM used, 12% RAM free
I.e. the solution is to use GC.start
explicitly if both YAML
and ruby-vips
were used in program.
Thank you for advices -- I'll try them out later. The only things that remains not understood for me is:
Vips::Error: vips_colourspace: no known route from 'multiband' to 'b-w'
I saw it earlier when tried to save b/w images or smth.
The "no route" message usually means you have hit a CMYK image. You'll need to add some extra code to import with an ICC profile for this case.
Constant GCing will probably hurt performance :( Perhaps every 100 images? It depends how much you need to keep memory down I guess.
Docs say:
but when I apply
.to_a
to https://drive.google.com/file/d/0B3BLwu7Vb2U-MnNqdHV4MzFSX2s/view?usp=sharing image there are two channels (probably another one is alpha):