optimisation of grayscale morphology operators

Morgane55440 commented 6 months ago

This PR has been made following the issue #597 : optimization of grayscale morphology operators

it aims to work on the internal implementation of the Mask and associated functions to optimize the execution time, as well as add parallel versions of some public functions using rayon.

in the first commit that i have made, i have added bench for the Mask creation methods and have reworked the implementation of the from_image methods, which allowed a 4x speed increase (200 ms to 50ms) for a large image (200 by 200 pixels). i would expect the improvement to be less significant for smaller images, although i have not tested it.

it was brought to my attention that i was accessing image pixels in col-major order. That was due to a misunderstanding of the image layout from my part. In light of this, i have reworked the implementation of the Mask to facilitate and incentivize access in row-major order by storing the positions row by row.

i have also been advised to change (i16, i16) to Point for readability, which i will do in a further commit.

Finally, through my testing, i tried to use unsafe_get_pixel to optimize my get_pixel calls. While it was very efficient, i found that accessing the buffer directly and using the chunks method was faster by around 17% (~65 to ~50microsecond ). i tested it several times, and the slowest chunks time was faster than the fastest unsafe_get_pixel by 10 ms.

i believe that it might be interesting to look at for other uses of unsafe_get_pixel which traverse the image line by line, as it potentially could be the case that directly traversing a [u8] might allow the compiler to do better optimizations than when these accesses are obfuscated through unsafe_get_pixel

Morgane55440 commented 6 months ago

next i'm changing the (i16, i16) to Point, benching grayscale_erode and grayscale_dilate, and testing different implementations

cospectrum commented 6 months ago

What about this? Similar performance. I'm a fan of image methods, the devs have already come up with effective access to pixels

let elements = image
            .enumerate_pixels()
            .filter(|(_, _, &p)| p[0] != 0)
            .map(|(x, y, _)| (x as i16 - center_x as i16, y as i16 - center_y as i16))
            .collect();
Self { elements }

Morgane55440 commented 6 months ago

let elements = image
            .enumerate_pixels()
            .filter(|(_, _, &p)| p[0] != 0)
            .map(|(x, y, _)| (x as i16 - center_x as i16, y as i16 - center_y as i16))
            .collect();
Self { elements }

thanks a ton ! it is just as fast, if not slightly faster (seems maybe 5-10 microsecond faster on average, but there was overlap)

Morgane55440 commented 6 months ago

i am not super knowledgeable on all the image methods, but i'll look into them more

Morgane55440 commented 6 months ago

i think i probably should look into the other constructors

running 4 tests
test morphology::benches::bench_grayscale_diamond_mask                                ... bench:     185,236 ns/iter (+/- 19,262)
test morphology::benches::bench_grayscale_disk_mask                                   ... bench:     112,540 ns/iter (+/- 36,400)
test morphology::benches::bench_grayscale_mask_from_image                             ... bench:      47,545 ns/iter (+/- 1,532)
test morphology::benches::bench_grayscale_square_mask                                 ... bench:      43,016 ns/iter (+/- 1,629)

(they all make masks of the same size)

cospectrum commented 6 months ago

The new release is scheduled for May 14th, if anything. I think you will make it in time

Morgane55440 commented 6 months ago

running 4 tests
test morphology::benches::bench_grayscale_diamond_mask                                ... bench:      17,375 ns/iter (+/- 1,134)
test morphology::benches::bench_grayscale_disk_mask                                   ... bench:      33,289 ns/iter (+/- 3,166)
test morphology::benches::bench_grayscale_mask_from_image                             ... bench:      50,766 ns/iter (+/- 2,226)
test morphology::benches::bench_grayscale_square_mask                                 ... bench:      40,882 ns/iter (+/- 2,205)

pretty happy with this. going to sleep. Tomorrow i'll work on making the disk mask code clearer, and then i'll move on to erode and dilate

Morgane55440 commented 6 months ago

@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk readable

Morgane55440 commented 6 months ago

initial state of benches :

test morphology::benches::bench_grayscale_op_erode_big_image_diamond                  ... bench:  87,424,590 ns/iter (+/- 3,543,468)
test morphology::benches::bench_grayscale_op_erode_big_image_point                    ... bench:   2,013,255 ns/iter (+/- 65,107)
test morphology::benches::bench_grayscale_op_erode_medium_image_diamond               ... bench:   3,414,165 ns/iter (+/- 91,476)
test morphology::benches::bench_grayscale_op_erode_medium_image_large_square          ... bench: 133,544,210 ns/iter (+/- 30,206,688)
test morphology::benches::bench_grayscale_op_erode_medium_image_point                 ... bench:      81,142 ns/iter (+/- 7,243)
test morphology::benches::bench_grayscale_op_erode_small_image_diamond                ... bench:     242,079 ns/iter (+/- 58,845)
test morphology::benches::bench_grayscale_op_erode_small_image_large_square           ... bench:   6,978,410 ns/iter (+/- 267,406)
test morphology::benches::bench_grayscale_op_erode_small_image_point                  ... bench:       5,356 ns/iter (+/- 636)

test morphology::benches::bench_grayscale_op_dilate_big_image_diamond                 ... bench:  87,889,700 ns/iter (+/- 24,124,426)
test morphology::benches::bench_grayscale_op_dilate_big_image_point                   ... bench:   2,287,300 ns/iter (+/- 64,361)
test morphology::benches::bench_grayscale_op_dilate_medium_image_diamond              ... bench:   3,316,540 ns/iter (+/- 50,421)
test morphology::benches::bench_grayscale_op_dilate_medium_image_large_square         ... bench: 134,237,150 ns/iter (+/- 29,765,530)
test morphology::benches::bench_grayscale_op_dilate_medium_image_point                ... bench:      99,185 ns/iter (+/- 10,606)
test morphology::benches::bench_grayscale_op_dilate_small_image_diamond               ... bench:     213,802 ns/iter (+/- 5,383)
test morphology::benches::bench_grayscale_op_dilate_small_image_large_square          ... bench:   7,203,760 ns/iter (+/- 2,410,854)
test morphology::benches::bench_grayscale_op_dilate_small_image_point                 ... bench:       6,268 ns/iter (+/- 1,050)

cospectrum commented 6 months ago

@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk readable

The master version is much more beautiful.

Morgane55440 commented 6 months ago

@cospectrum if you don't mind i would really appreciate your take on my last commit, i kinda struggled with making disk readable

The master version is much more beautiful.

it is, but it's 3-4 times slower

Morgane55440 commented 6 months ago

what i did there was, instead of just checking if each pixel is inside or outside the disk, i computed the edges of the disk, and then filled it in.

it is messier, but much faster. i would like to maybe make it less messy without necessarily sacrificing speed

Morgane55440 commented 6 months ago

the new commit is faster by about 20-30% for big images, but it is utterly unreadable, i will rework it to make it even faster and significantly more readable

also, i did not know about the minimum supported version, will rewrite the offending code

cospectrum commented 6 months ago

You can write simple reference implementations (in mod tests) and test against them. They will also be useful in the future.

theotherphil commented 6 months ago

The chunks finding is interesting. This crate originally aimed to work on GenericImages but I gave up on that when the performance overhead was too high. Now that we know we’re working with image buffers we can probably find a few places where existing functions can be sped up.

As this PR doesn’t change any APIs and comes with a significant speedup I’d like to get his merged for the next release - @Morgane55440 let me know when you’re ready for this to be merged. We can always simplify or clarify implementations later if necessary.

cospectrum commented 6 months ago

If it works, then I can simplify/refactor later myself, after the merge

Morgane55440 commented 6 months ago

@theotherphil @cospectrum sorry for the wait was busy with life stuff. i finally got a decent optimization in, from 2x to almost 40x depending on the image and mask size/shape (the bigger the better the optimization)

i'm certain significant further improvements can be made, but i'm ok with shipping things as they are, except for the lines method which is a mess. i'm working on it.

also the apply method is now unused. i think it's possible it might be useful later although it seems unlikely because it's just a very slow way to do things. should i remove it ?

theotherphil commented 6 months ago

Brilliant, thanks. Yes, remove apply if it’s not used.

Morgane55440 commented 6 months ago

i have to say (working on lines amt), chunk_by is exactly what i need, that's really a shame. i'm on a good path to figuring out something good though

Morgane55440 commented 6 months ago

new benches for erode() and dilate() :

test morphology::benches::bench_grayscale_op_erode_big_image_diamond                  ... bench:   1,906,060 ns/iter (+/- 57,854)
test morphology::benches::bench_grayscale_op_erode_big_image_point                    ... bench:      71,312 ns/iter (+/- 28,722)
test morphology::benches::bench_grayscale_op_erode_medium_image_diamond               ... bench:     146,737 ns/iter (+/- 4,326)
test morphology::benches::bench_grayscale_op_erode_medium_image_large_square          ... bench:   8,119,980 ns/iter (+/- 1,010,798)
test morphology::benches::bench_grayscale_op_erode_medium_image_point                 ... bench:       4,376 ns/iter (+/- 3,639)
test morphology::benches::bench_grayscale_op_erode_small_image_diamond                ... bench:      42,703 ns/iter (+/- 3,684)
test morphology::benches::bench_grayscale_op_erode_small_image_large_square           ... bench:   1,232,370 ns/iter (+/- 361,772)
test morphology::benches::bench_grayscale_op_erode_small_image_point                  ... bench:       1,036 ns/iter (+/- 83)

test morphology::benches::bench_grayscale_op_dilate_big_image_diamond                 ... bench:   1,837,840 ns/iter (+/- 30,849)
test morphology::benches::bench_grayscale_op_dilate_big_image_point                   ... bench:      64,265 ns/iter (+/- 1,396)
test morphology::benches::bench_grayscale_op_dilate_medium_image_diamond              ... bench:     158,506 ns/iter (+/- 11,497)
test morphology::benches::bench_grayscale_op_dilate_medium_image_large_square         ... bench:   8,156,330 ns/iter (+/- 2,925,125)
test morphology::benches::bench_grayscale_op_dilate_medium_image_point                ... bench:       4,266 ns/iter (+/- 115)
test morphology::benches::bench_grayscale_op_dilate_small_image_diamond               ... bench:      41,219 ns/iter (+/- 777)
test morphology::benches::bench_grayscale_op_dilate_small_image_large_square          ... bench:   1,202,150 ns/iter (+/- 300,326)
test morphology::benches::bench_grayscale_op_dilate_small_image_point                 ... bench:       1,025 ns/iter (+/- 18)

Morgane55440 commented 6 months ago

ok, so it all seems good, just 2 things i was thinking about :

first, small and simple : i think it would be good to refactor the erode and dilate code into a single function, something like per_pixel_mask_based_reduce(image: &GrayImage, mask: &Mask, neutral : u8, operator : Fn (u8,u8) -> u8)i'd like a second opinion to be sure, especially on the name
second, less simple, maybe not super relevant anymore : i mentioned adding parallel versions of the user-facing functions in the beginning of the PR. that wasn't the focus at first, but i think it could easily be completely finished by tomorrow. of course, it might make more sense to leave that for another PR and therefore another release

Morgane55440 commented 6 months ago

then it should all be ready to merge

Morgane55440 commented 6 months ago

i did the refactor, if anyone has a better function name idea, i'm all ears, otherwise, i think this could be good to merge

it's not user-facing, so it's not a big deal

Morgane55440 commented 6 months ago

@ripytide all of the comments remaining are about Mask::new().

this was not part of my intended code, i added it due to a suggestion by @cospectrum , and i think it was a mistake on my part

You can add private new and call it instead of Self { elements } everywhere
fn new(elements: Vec<Point<i16>>) -> Self {
    assert!(elements.len() <= (511 * 511) as usize);
    Self { elements } 
}
the code that i want to make guarantees that for all Masks : all the integer values will be strictly between -512 and 512

all tuples will be sorted in lexicographic order

all tuples will be sorted in reverse lexicographic order, line by line ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),...)

no tuple shall appear twice

this shall be true for all Masks, and is garanteed by the current implementation of all the user-facing constructor. i think the new should just be removed. the creation of custiom maks is already possible through from_image

Morgane55440 commented 6 months ago

maybe i could add tests that check the conditions are always met ?

ripytide commented 6 months ago

Ah okay, that makes sense. Since there are tests already I would expect them to fail if any of the constructors stopped abiding by the correct ordering so you probably don't need to write explicit tests for checking the ordering as that is just an implementation detail.

Morgane55440 commented 6 months ago

well then it should be good as-is

i could add a comment above the initializations, something like :

 // direct initialization stands as guarantee that all conditions mentioned by the doc are fulfilled for the struct and each of it's parts
Self { elements }

Morgane55440 commented 6 months ago

thanks a ton for typos @ripytide . i fixed the others it showed me, including some in distance_transform because i wrote them, they seemed relevant and there are no other PR on the subject atm

cospectrum commented 6 months ago

Why did you remove new?

Morgane55440 commented 6 months ago

@cospectrum because it didn't do anything and was confusing.

each constructor guarantees the struct it returns is valid in it's own way. the existence of the new method implied that it was validating the data. it wasn't, and making it truly validate the data would require it to iterate through and check the whole vector, which would slow down all the constructors just to double check something that was already assured.

Morgane55440 commented 6 months ago

replacing Mystruct { my_data } by Mystruct::new(my_data ) is great, except if you're already in a constructor which is doing it's own validation

Morgane55440 commented 6 months ago

i think i have an alternative solution that may help documentation, to make clearer what's happening:

    /// new_unchecked creates a new mask without doing any form of data validation
    /// 
    /// # Safety
    /// 
    /// this method is not user-facing and marked `unsafe` because it may lead 
    /// to invalid state if called with the wrong arguments.
    /// 
    /// By calling new_unchecked, you guarantee that :
    /// - all the integer values in `elements` will be strictly between -512 and 512
    /// - the maximum L_inf distance between any 2 points in `elements` is 512
    /// - all Points in `elements` are sorted in reverse lexicographic order, line by line ((-1,-1),(0,-1),(1,-1),(-1,0),(0,0),...)
    /// - no point appears twice in `elements`
    unsafe fn new_unchecked(elements: Vec<Point<i16>>) -> Self {
        Self { elements }
    }

Morgane55440 commented 6 months ago

none of it is user-facing, so it's not a huge deal, but it probably would be helpful for maintainability to have the danger laid out by the unsafe block and all the documentation stored in the same place

cospectrum commented 6 months ago

except if you're already in a constructor which is doing it's own validation

None of your constructors do real validation. Comments prove nothing, asserts do

the existence of the new method implied that it was validating the data. It wasn't

It does 1 check. One is better than nothing. And it may have more in the future.

and making it truly validate the data would require it to iterate through and check the whole vector

That's why debug_assert exists

Morgane55440 commented 6 months ago

so you would prefer a new method with debug_asserts ? i can make it if you think it's the best way to guarantee safety.

i'm just trying to make code as good as possible, following the advice and criticisms i'm given, this is all work in progress.

theotherphil commented 6 months ago

Thanks @Morgane55440 ! It’ll be interesting to see if we can copy the chunks approach in other functions for decent speed ups and less unsafe code.

image-rs / imageproc

optimisation of grayscale morphology operators #598