mthh / contour-rs

Contour polygon creation in Rust (using marching squares algorithm)
https://crates.io/crates/contour
Apache License 2.0
52 stars 10 forks source link

Make value type generic and independent from coordinate type #14

Open netthier opened 7 months ago

netthier commented 7 months ago

This PR adds a new trait called ContourValue (though I'm not too happy with that name and would love better suggestions), which is defined as

pub trait ContourValue: PartialOrd + Copy + Num + NumCast{}
impl<T> ContourValue for T where T: PartialOrd + Copy + Num + NumCast {}

All method signatures that took some type of values before are now rewritten to something such as

pub fn contours<V: ContourValue>(&self, values: &[V], thresholds: &[V]) -> Result<Vec<Contour<V>>>

This allows for substantial memory and performance improvements in cases where the additional precision is not needed. The PR is currently missing tests however, and it includes the changes from https://github.com/mthh/contour-rs/pull/13, meaning that should be merged first.

netthier commented 7 months ago

The geojson feature is also currently not working, because V needs to be converted to a serde_json::Value in those functions. Adding a Serialize bound shouldn't be an issue, but would make those functions fallible, adding another breaking change :thinking:

mthh commented 7 months ago

Thanks for your PR and sorry it took me a while to reply.

Indeed, making value type generic and independent from coordinate type is a great thing :+1: (and it was probably a debatable design choice to have linked the numerical type of coordinates and the numerical type of values in the input grid as we saw in issue #12).

Adding a Serialize bound shouldn't be an issue, but would make those functions fallible, adding another breaking change

If this is the only way to keep the geojson feature working, we might as well take advantage of the breaking changes this PR is already making to make this one too, what do you thing ?

a new trait called ContourValue (though I'm not too happy with that name and would love better suggestions)

I don't have any particular problems with ContourValue. From a user point of view, we could also talk about GridValue (since it designates the type of input values) or simply Value (since this library has a single, fairly clear purpose, that of making contours from a grid of values, there's not necessarily too much confusion possible).

mthh commented 7 months ago

Hi @netthier! Is there anything I can do to help finalise this PR? Would you like me to continue the work you have started?

netthier commented 7 months ago

If this is the only way to keep the geojson feature working, we might as well take advantage of the breaking changes this PR is already making to make this one too, what do you thing ?

I'll do so then. Who knows what kind of numbers people might put in there, maybe some can indeed fail serialization.

I don't have any particular problems with ContourValue. From a user point of view, we could also talk about GridValue (since it designates the type of input values) or simply Value (since this library has a single, fairly clear purpose, that of making contours from a grid of values, there's not necessarily too much confusion possible).

I prefer GridValue over ContourValue. Value feels a bit too generic tbh.

I'll try to finalize this PR tommorow, but I'm still unsure about the part of the code I commented on in smooth_linear. I believe that the cast should be infallible in normal usage, as NumCast wlll just return +Inf or -Inf if the input is outside the float's range. But it is conceivable that someone could implement NumCast for their own number type and have it fail for whatever reason. Maybe it makes sense to create a new error kind and have smooth_linear return a Result? Most other methods seem to already do so, so it wouldn't be that extreme of a change.

mthh commented 7 months ago

Who knows what kind of numbers people might put in there, maybe some can indeed fail serialization.

I don't really know either but let's say "better safe than sorry".

I prefer GridValue over ContourValue. Value feels a bit too generic tbh.

Let's go with GridValue :+1:

Maybe it makes sense to create a new error kind and have smooth_linear return a Result? Most other methods seem to already do so, so it wouldn't be that extreme of a change.

This seems like a good idea to me, with a new error type to be explicit about the reason for the error if it occurs (and since the smooth_linear caller already returns a Result it won't change the API for the user).

netthier commented 7 months ago

This could use some tests tbh, but I'm not sure if I have the time and knowledge required to write good ones, let me know how you want to proceed there :thinking: I also found out that you cannot edit this PR because I made it from an organization repo, while edits by maintainers are apparently only possible on personal repos. I'll have to find out how to do it better in the future.

netthier commented 7 months ago

I'm also unsure how my changes to ContourBuilder::smooth_linear affect performance. I added more casts than previously, as I noticed that numbers without negative values might panic at the value - v0 and v1 - v0 parts.

mthh commented 7 months ago

Thanks!

This could use some tests tbh, but I'm not sure if I have the time and knowledge required to write good ones

I don't really have an idea of how to make meaningful tests for all this yet, but I'm thinking about it a bit.

I'm also unsure how my changes to ContourBuilder::smooth_linear affect performance. I added more casts than previously, as I noticed that numbers without negative values might panic at the value - v0 and v1 - v0 parts.

I'm going to set up some benchmarks over the weekend to try to measure the difference in performance between 0.13.0 and your PR, and review your PR in the process.

netthier commented 7 months ago

I guess a slowdown would be expected with a lot of more handling of results now. I'm also seeing differences when running cargo bench locally, with the worst difference I've seen being 10%. The use case I implemented this for was one that didn't require floating point precision, and in those cases performance is far better than with floats, meaning the overhead added is irrelevant. One thing I noticed is that if smoothing is disabled, a lot of functions are infallible and returning results for no reason. It should be possible to optimize this more :thinking: And a cast to a float failing seems like such an esoteric thing (I dont think any std types fail this) that adding all of that overhead just for that seems bad. If users upgrade and want to keep using floats, they'd get a slowdown in exchange for having error handling on a bunch of f32->f32 conversions D: I wonder if using some infallible conversion trait instead of NumCast or straight up panicking would be better.

EDIT: Unwrapping and not propagating results at all isn't much faster, I wonder where all that overhead is coming from

netthier commented 7 months ago

I'll have to profile this, I'm a bit surprised that even the no_smoothing benchmark is slower when I remove all the result propagation :thinking:

mthh commented 7 months ago

if users upgrade and want to keep using floats, they'd get a slowdown in exchange for having error handling on a bunch of f32->f32 conversions I wonder if using some infallible conversion trait instead of NumCast or straight up panicking would be better.

Unwrapping and not propagating results at all isn't much faster, I wonder where all that overhead is coming from

Indeed, there's no rush to merge this, let's investigate a little bit the overhead it adds first.

I'll have to profile this, I'm a bit surprised that even the no_smoothing benchmark is slower when I remove all the result propagation 🤔

Great if you get the chance to do this!