Closed Morgane55440 closed 6 months ago
next i'm changing the (i16, i16)
to Point
, benching grayscale_erode
and grayscale_dilate
, and testing different implementations
What about this? Similar performance. I'm a fan of image
methods, the devs have already come up with effective access to pixels
let elements = image
.enumerate_pixels()
.filter(|(_, _, &p)| p[0] != 0)
.map(|(x, y, _)| (x as i16 - center_x as i16, y as i16 - center_y as i16))
.collect();
Self { elements }
let elements = image .enumerate_pixels() .filter(|(_, _, &p)| p[0] != 0) .map(|(x, y, _)| (x as i16 - center_x as i16, y as i16 - center_y as i16)) .collect(); Self { elements }
thanks a ton ! it is just as fast, if not slightly faster (seems maybe 5-10 microsecond faster on average, but there was overlap)
i am not super knowledgeable on all the image
methods, but i'll look into them more
i think i probably should look into the other constructors
running 4 tests
test morphology::benches::bench_grayscale_diamond_mask ... bench: 185,236 ns/iter (+/- 19,262)
test morphology::benches::bench_grayscale_disk_mask ... bench: 112,540 ns/iter (+/- 36,400)
test morphology::benches::bench_grayscale_mask_from_image ... bench: 47,545 ns/iter (+/- 1,532)
test morphology::benches::bench_grayscale_square_mask ... bench: 43,016 ns/iter (+/- 1,629)
(they all make masks of the same size)
The new release is scheduled for May 14th, if anything. I think you will make it in time
running 4 tests
test morphology::benches::bench_grayscale_diamond_mask ... bench: 17,375 ns/iter (+/- 1,134)
test morphology::benches::bench_grayscale_disk_mask ... bench: 33,289 ns/iter (+/- 3,166)
test morphology::benches::bench_grayscale_mask_from_image ... bench: 50,766 ns/iter (+/- 2,226)
test morphology::benches::bench_grayscale_square_mask ... bench: 40,882 ns/iter (+/- 2,205)
pretty happy with this. going to sleep. Tomorrow i'll work on making the disk mask code clearer, and then i'll move on to erode
and dilate
@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk
readable
initial state of benches :
test morphology::benches::bench_grayscale_op_erode_big_image_diamond ... bench: 87,424,590 ns/iter (+/- 3,543,468)
test morphology::benches::bench_grayscale_op_erode_big_image_point ... bench: 2,013,255 ns/iter (+/- 65,107)
test morphology::benches::bench_grayscale_op_erode_medium_image_diamond ... bench: 3,414,165 ns/iter (+/- 91,476)
test morphology::benches::bench_grayscale_op_erode_medium_image_large_square ... bench: 133,544,210 ns/iter (+/- 30,206,688)
test morphology::benches::bench_grayscale_op_erode_medium_image_point ... bench: 81,142 ns/iter (+/- 7,243)
test morphology::benches::bench_grayscale_op_erode_small_image_diamond ... bench: 242,079 ns/iter (+/- 58,845)
test morphology::benches::bench_grayscale_op_erode_small_image_large_square ... bench: 6,978,410 ns/iter (+/- 267,406)
test morphology::benches::bench_grayscale_op_erode_small_image_point ... bench: 5,356 ns/iter (+/- 636)
test morphology::benches::bench_grayscale_op_dilate_big_image_diamond ... bench: 87,889,700 ns/iter (+/- 24,124,426)
test morphology::benches::bench_grayscale_op_dilate_big_image_point ... bench: 2,287,300 ns/iter (+/- 64,361)
test morphology::benches::bench_grayscale_op_dilate_medium_image_diamond ... bench: 3,316,540 ns/iter (+/- 50,421)
test morphology::benches::bench_grayscale_op_dilate_medium_image_large_square ... bench: 134,237,150 ns/iter (+/- 29,765,530)
test morphology::benches::bench_grayscale_op_dilate_medium_image_point ... bench: 99,185 ns/iter (+/- 10,606)
test morphology::benches::bench_grayscale_op_dilate_small_image_diamond ... bench: 213,802 ns/iter (+/- 5,383)
test morphology::benches::bench_grayscale_op_dilate_small_image_large_square ... bench: 7,203,760 ns/iter (+/- 2,410,854)
test morphology::benches::bench_grayscale_op_dilate_small_image_point ... bench: 6,268 ns/iter (+/- 1,050)
@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making
disk
readable
The master
version is much more beautiful.
@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making
disk
readableThe
master
version is much more beautiful.
it is, but it's 3-4 times slower
what i did there was, instead of just checking if each pixel is inside or outside the disk, i computed the edges of the disk, and then filled it in.
it is messier, but much faster. i would like to maybe make it less messy without necessarily sacrificing speed
the new commit is faster by about 20-30% for big images, but it is utterly unreadable, i will rework it to make it even faster and significantly more readable
also, i did not know about the minimum supported version, will rewrite the offending code
You can write simple reference implementations (in mod tests
) and test against them. They will also be useful in the future.
The chunks
finding is interesting. This crate originally aimed to work on GenericImage
s but I gave up on that when the performance overhead was too high. Now that we know we’re working with image buffers we can probably find a few places where existing functions can be sped up.
As this PR doesn’t change any APIs and comes with a significant speedup I’d like to get his merged for the next release - @Morgane55440 let me know when you’re ready for this to be merged. We can always simplify or clarify implementations later if necessary.
If it works, then I can simplify/refactor later myself, after the merge
@theotherphil @cospectrum sorry for the wait was busy with life stuff. i finally got a decent optimization in, from 2x to almost 40x depending on the image and mask size/shape (the bigger the better the optimization)
i'm certain significant further improvements can be made, but i'm ok with shipping things as they are, except for the lines
method which is a mess. i'm working on it.
also the apply
method is now unused. i think it's possible it might be useful later although it seems unlikely because it's just a very slow way to do things. should i remove it ?
Brilliant, thanks. Yes, remove apply
if it’s not used.
i have to say (working on lines
amt), chunk_by
is exactly what i need, that's really a shame. i'm on a good path to figuring out something good though
new benches for erode()
and dilate()
:
test morphology::benches::bench_grayscale_op_erode_big_image_diamond ... bench: 1,906,060 ns/iter (+/- 57,854)
test morphology::benches::bench_grayscale_op_erode_big_image_point ... bench: 71,312 ns/iter (+/- 28,722)
test morphology::benches::bench_grayscale_op_erode_medium_image_diamond ... bench: 146,737 ns/iter (+/- 4,326)
test morphology::benches::bench_grayscale_op_erode_medium_image_large_square ... bench: 8,119,980 ns/iter (+/- 1,010,798)
test morphology::benches::bench_grayscale_op_erode_medium_image_point ... bench: 4,376 ns/iter (+/- 3,639)
test morphology::benches::bench_grayscale_op_erode_small_image_diamond ... bench: 42,703 ns/iter (+/- 3,684)
test morphology::benches::bench_grayscale_op_erode_small_image_large_square ... bench: 1,232,370 ns/iter (+/- 361,772)
test morphology::benches::bench_grayscale_op_erode_small_image_point ... bench: 1,036 ns/iter (+/- 83)
test morphology::benches::bench_grayscale_op_dilate_big_image_diamond ... bench: 1,837,840 ns/iter (+/- 30,849)
test morphology::benches::bench_grayscale_op_dilate_big_image_point ... bench: 64,265 ns/iter (+/- 1,396)
test morphology::benches::bench_grayscale_op_dilate_medium_image_diamond ... bench: 158,506 ns/iter (+/- 11,497)
test morphology::benches::bench_grayscale_op_dilate_medium_image_large_square ... bench: 8,156,330 ns/iter (+/- 2,925,125)
test morphology::benches::bench_grayscale_op_dilate_medium_image_point ... bench: 4,266 ns/iter (+/- 115)
test morphology::benches::bench_grayscale_op_dilate_small_image_diamond ... bench: 41,219 ns/iter (+/- 777)
test morphology::benches::bench_grayscale_op_dilate_small_image_large_square ... bench: 1,202,150 ns/iter (+/- 300,326)
test morphology::benches::bench_grayscale_op_dilate_small_image_point ... bench: 1,025 ns/iter (+/- 18)
ok, so it all seems good, just 2 things i was thinking about :
erode
and dilate
code into a single function, something like
per_pixel_mask_based_reduce(image: &GrayImage, mask: &Mask, neutral : u8, operator : Fn (u8,u8) -> u8)
i'd like a second opinion to be sure, especially on the namethen it should all be ready to merge
i did the refactor, if anyone has a better function name idea, i'm all ears, otherwise, i think this could be good to merge
it's not user-facing, so it's not a big deal
@ripytide
all of the comments remaining are about Mask::new()
.
this was not part of my intended code, i added it due to a suggestion by @cospectrum , and i think it was a mistake on my part
You can add private
new
and call it instead ofSelf { elements }
everywherefn new(elements: Vec<Point<i16>>) -> Self { assert!(elements.len() <= (511 * 511) as usize); Self { elements } }
the code that i want to make guarantees that for all
Mask
s : all the integer values will be strictly between -512 and 512
- all tuples will be sorted in lexicographic order
- all tuples will be sorted in reverse lexicographic order, line by line ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),...)
- no tuple shall appear twice
this shall be true for all Mask
s, and is garanteed by the current implementation of all the user-facing constructor.
i think the new
should just be removed. the creation of custiom maks is already possible through from_image
maybe i could add tests that check the conditions are always met ?
Ah okay, that makes sense. Since there are tests already I would expect them to fail if any of the constructors stopped abiding by the correct ordering so you probably don't need to write explicit tests for checking the ordering as that is just an implementation detail.
well then it should be good as-is
i could add a comment above the initializations, something like :
// direct initialization stands as guarantee that all conditions mentioned by the doc are fulfilled for the struct and each of it's parts
Self { elements }
thanks a ton for typos @ripytide . i fixed the others it showed me, including some in distance_transform
because i wrote them, they seemed relevant and there are no other PR on the subject atm
Why did you remove new
?
@cospectrum because it didn't do anything and was confusing.
each constructor guarantees the struct it returns is valid in it's own way.
the existence of the new
method implied that it was validating the data.
it wasn't, and making it truly validate the data would require it to iterate through and check the whole vector, which would slow down all the constructors just to double check something that was already assured.
replacing Mystruct { my_data }
by Mystruct::new(my_data )
is great, except if you're already in a constructor which is doing it's own validation
i think i have an alternative solution that may help documentation, to make clearer what's happening:
/// new_unchecked creates a new mask without doing any form of data validation
///
/// # Safety
///
/// this method is not user-facing and marked `unsafe` because it may lead
/// to invalid state if called with the wrong arguments.
///
/// By calling new_unchecked, you guarantee that :
/// - all the integer values in `elements` will be strictly between -512 and 512
/// - the maximum L_inf distance between any 2 points in `elements` is 512
/// - all Points in `elements` are sorted in reverse lexicographic order, line by line ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),...)
/// - no point appears twice in `elements`
unsafe fn new_unchecked(elements: Vec<Point<i16>>) -> Self {
Self { elements }
}
none of it is user-facing, so it's not a huge deal, but it probably would be helpful for maintainability to have the danger laid out by the unsafe
block and all the documentation stored in the same place
except if you're already in a constructor which is doing it's own validation
None of your constructors do real validation. Comments prove nothing, asserts do
the existence of the new method implied that it was validating the data. It wasn't
It does 1 check. One is better than nothing. And it may have more in the future.
and making it truly validate the data would require it to iterate through and check the whole vector
That's why debug_assert exists
so you would prefer a new
method with debug_assert
s ? i can make it if you think it's the best way to guarantee safety.
i'm just trying to make code as good as possible, following the advice and criticisms i'm given, this is all work in progress.
Thanks @Morgane55440 ! It’ll be interesting to see if we can copy the chunks approach in other functions for decent speed ups and less unsafe code.
This PR has been made following the issue #597 : optimization of grayscale morphology operators
it aims to work on the internal implementation of the
Mask
and associated functions to optimize the execution time, as well as add parallel versions of some public functions using rayon.in the first commit that i have made, i have added bench for the
Mask
creation methods and have reworked the implementation of thefrom_image
methods, which allowed a 4x speed increase (200 ms to 50ms) for a large image (200 by 200 pixels). i would expect the improvement to be less significant for smaller images, although i have not tested it.it was brought to my attention that i was accessing image pixels in col-major order. That was due to a misunderstanding of the image layout from my part. In light of this, i have reworked the implementation of the
Mask
to facilitate and incentivize access in row-major order by storing the positions row by row.i have also been advised to change (i16, i16) to Point for readability, which i will do in a further commit.
Finally, through my testing, i tried to use
unsafe_get_pixel
to optimize myget_pixel
calls. While it was very efficient, i found that accessing the buffer directly and using thechunks
method was faster by around 17% (~65 to ~50microsecond ). i tested it several times, and the slowestchunks
time was faster than the fastestunsafe_get_pixel
by 10 ms.i believe that it might be interesting to look at for other uses of
unsafe_get_pixel
which traverse the image line by line, as it potentially could be the case that directly traversing a[u8]
might allow the compiler to do better optimizations than when these accesses are obfuscated throughunsafe_get_pixel