verpeteren / rust-simd-noise

SIMD noise library for Rust
https://crates.io/crates/simdnoise
252 stars 20 forks source link

Support for f64s? #19

Closed TheHellBox closed 1 year ago

TheHellBox commented 4 years ago

Hi! I'm using the library for the procedural planet surface generation, and the scales are huge(Up to the earth size). f32s precision is not enough there unfortunately, so having a support for f64s would be great.

I have no idea what are the limitations for simd though, but I think it's possible to implement f64s for scalar.

Thanks for the great library!

jackmott commented 4 years ago

Hey there, no fundamental reason why you can't do f64 noise here. The speedup due to simd is cut in half but will still be there. PRs are welcome!

However I do wonder if you really are going to need it. Like are you modeling the geometry of earth down to 1 meter scale or something? If you are attempting to do that globally, how are you going to render that much data, and store it, and so on?

Normally such a prospect would be done in chunks. When viewing the planet from afar you only need noise at very large scale, when up close you need to zoom in the detail but you don't need the whole planet any more. I would think you could just layer f32 noise and that might be more practical than using f64 noise to model the whole thing at once?

TheHellBox commented 4 years ago

Thanks for the answer!

Normally such a prospect would be done in chunks. When viewing the planet from afar you only need noise at very large scale, when up close you need to zoom in the detail but you don't need the whole planet any more. I would think you could just layer f32 noise and that might be more practical than using f64 noise to model the whole thing at once?

Yes, the data is stored in chunks. But the problem is different, it's is about coordinates precision. The sphere has a huge radius(2/10 of the earth scale in my case, or 1275620m), so when I want to sample a point on it, the f32 precision for a point coordinates is not enough.

If you take a f64 point on a planet, for example x: 1039566.7102027689 y: 682214.653570567 z: -284764.1232537858 the f32 representation of this point rounds to x: 1039566.7 y: 682214.6 z: -284764.13

When samples are close to each other, this precision error becomes very noticeable, especially with sensitive stuff like normal maps(See example below, the error is in those black/white dots, those are the broken normals) image

And this is how it's supposed to look like(Captured when I was using noise-rs) image

Hey there, no fundamental reason why you can't do f64 noise here.

So it's possible to pass f64 values with current library functions? If so, that's great! But what about scalar implementation? Sadly it only allows for f32 coordinates.

I can make a PR and add f64 functionality to scalar if you wish. Should it be a new set of functions, like simdnoise::scalar::fbm_3d_f64, or there is a better way of doing that?

My code is mostly a reimplementation of Ralith's planetmap, I still haven't published my code anywhere, but his code is very similar (https://github.com/Ralith/planetmap), so if you need more of the details, you can check how it works, everything is very well commented. And sorry for a poor Engilsh if that's a problem

jackmott commented 4 years ago

You would need to write all new 64 bit functions for everything. A lot of grunt work, largely copy-pasta.

I think your precision issue could be solved by this common approach to solar system scale games: https://www.gamasutra.com/blogs/RobertWooden/20150413/240938/Overcoming_Floating_Point_Precision_Errors.php

TheHellBox commented 4 years ago

This won't work for surface sampling unfortunately.

All of those techniques only work for graphics, and in my case, I just round f64s transform matrices into f32s before sending onto gpu. OpenGL's camera position is always a 0, so precision is not a problem after the conversion. (See: https://github.com/Ralith/planetmap/blob/master/src/chunk.rs#L147) That's not possible with most of the game engines as they use f32s for transforms, so this is why people are using such workaround, but as my engine is self-written, and I can just store everything in f64s with no need for workarounds.

Sampling is different, if you want to get a sample on a very specific point, you can only pass this point's coordinate, and this is why precision matters here

Ralith commented 4 years ago

The fundamental problem here is that a noise function is needed which can give accurate results for a spherical domain with extremely high precision, because the player will often be viewing an extremely small portion of the sphere and it's desirable to have significant texture variation on the scale of the player's view. Layering doesn't help because you still need that single high-precision domain; any noise whose domain doesn't gracefully embed a sphere will have artifacts or discontinuities.

Ralith commented 4 years ago

Also, the main reason @TheHellBox is having trouble with normal maps here is because he's computing normals by finite difference method. An analytic approach would buy another order of magnitude or two of precision, which might be enough.

jackmott commented 4 years ago

cool, well if anyone wants to submit a PR to add f64 support I'm happy to help guide the process. mostly a task of copy-pasta the existing functions and changing the types, there will likely be a few spots you have to adjust some of the simd stuff which I can help with. Don't think it can quite be made generic though proof of concepts of that would be interesting.

TheHellBox commented 4 years ago

About that, I've tried to look at what I can do. It seems like set1_epi64 is missing from simdeez(While setzero_epi64 is there), not sure if that's intended or not

jackmott commented 4 years ago

we can fix that, we have the technology

jackmott commented 4 years ago

set1_epi64 added to simdeez

TheHellBox commented 4 years ago

Thanks (: Now there are a bunch of other things missing, there is a short list of what I've found so far:

cvtpd_epi64
cvtepi64_pd
add_epi64
i64gather_epi64
sub_epi64
jackmott commented 4 years ago

added!

TheHellBox commented 4 years ago

Thanks! I've implemented simplex3d, it works just fine, now it's about copy-pasting all the other stuff

TheHellBox commented 4 years ago

There are also those functions missing

mullo_epi64
fast_floor_pd

fast_floor_pd is used in simplex_1d mullo_epi64 is used in cellular

jackmott commented 4 years ago

cool, on it

jackmott commented 4 years ago

ok done. keep in mind that mullo_epi64 is going to be a bit slow, as its just doing it in scalar fashion. This can maybe be sped up a bit with some trickery like so: https://stackoverflow.com/questions/17863411/sse-multiplication-of-2-64-bit-integers but I want to take some time with that. For now just wanted to get it working. Also not clear the trickery will be faster on all/any cpus.

jackmott commented 4 years ago

Got your changes merged, plan to test a bit to see what needs fixing. Still some work to do to make 64 bit versions of all the helper functions.

TheHellBox commented 4 years ago

As I said in a pull request, avx2 has quite a lot of artifacts on bigger scale, and sse2 seem to not work at all. So those are the things you might want to look at first. I have a limited understanding of how SIMD actually works, so I could make some mistakes there

jackmott commented 4 years ago

could be some bugs in some of the functions I added to simdeez too, I'll get it sorted. is your test case in the PR? if not can you post a snippet here maybe?

TheHellBox commented 4 years ago

I've tested the thing inside my engine, the code there looks like

Self::Fbm { lac, gain, octaves } => unsafe {
    if is_x86_feature_detected!("avx2") {
        let x = _mm_set1_pd(x);
        let y = _mm_set1_pd(y);
        let z = _mm_set1_pd(z);
        let lac = _mm_set1_pd(*lac);
        let gain = _mm_set1_pd(*gain);

        let s = simdeez::sse2::F64x2(simdnoise::sse2::fbm_3d_f64(x, y, z, lac, gain, *octaves, seed as i64));
        let mut r: f64 = 0.0;
        simdeez::sse2::Sse2::storeu_pd(&mut r, s);
        r
    }
    else{
        simdnoise::scalar::fbm_3d_f64(x, y, z, *lac, *gain, *octaves, seed as i64)
    }
}

I can publish the code for the whole engine, I'm planning to do it anyway, but I idk if it'll be any usefull for you

jackmott commented 4 years ago

that snippet is perfect, that's all I need, thanks.

Ralith commented 4 years ago

As of #29, finite differences are no longer needed, at least in principle and when using the low-level primitives. Should we revisit the inclusion of f64, given the significant maintenance hazard (e.g. #28) of having multiple almost-identical copies of the same code?

virtualritz commented 4 years ago

Maybe some macros could be used to avoid code duplication for the f64 case? This case is very useful for large scale terrain generation. I'm happy it is included now.

Ralith commented 4 years ago

This case is very useful for large scale terrain generation.

Can you elaborate? I'm currently working on using the f32 version to generate a 1:1 earth-scale planet and I'm not having any precision issues.

virtualritz commented 4 years ago

@Ralith Did you see the message that started this thread?

Ralith commented 4 years ago

Yes, I am personally familiar with @TheHellBox's use case. As discussed mid-thread, f64 was only required due to the use of finite difference methods to approximate derivatives, which is now no longer necessary, at least when using the primitives directly.

verpeteren commented 1 year ago

It seems that:

I just ran a check and here is the listing of all 32 and 64 functions:

scalar sse2 sse4.1 avx2
cellular_2d cellular_2d cellular_2d cellular_2d
cellular_2d_f64 cellular_2d_f64 cellular_2d_f64 cellular_2d_f64
cellular_3d cellular_3d cellular_3d cellular_3d
cellular_3d_f64 cellular_3d_f64 cellular_3d_f64 cellular_3d_f64
fbm_1d fbm_1d fbm_1d fbm_1d
fbm_1d_f64 fbm_1d_f64 fbm_1d_f64 fbm_1d_f64
fbm_2d fbm_2d fbm_2d fbm_2d
fbm_2d_f64 fbm_2d_f64 fbm_2d_f64 fbm_2d_f64
fbm_3d fbm_3d fbm_3d fbm_3d
fbm_3d_f64 fbm_3d_f64 fbm_3d_f64 fbm_3d_f64
fbm_4d fbm_4d fbm_4d fbm_4d
fbm_4d_f64 fbm_4d_f64 fbm_4d_f64 fbm_4d_f64
get_1d_noise get_1d_noise get_1d_noise get_1d_noise
get_1d_scaled_noise get_1d_scaled_noise get_1d_scaled_noise get_1d_scaled_noise
get_2d_noise get_2d_noise get_2d_noise get_2d_noise
get_2d_scaled_noise get_2d_scaled_noise get_2d_scaled_noise get_2d_scaled_noise
get_3d_noise get_3d_noise get_3d_noise get_3d_noise
get_3d_scaled_noise get_3d_scaled_noise get_3d_scaled_noise get_3d_scaled_noise
get_4d_noise get_4d_noise get_4d_noise get_4d_noise
get_4d_scaled_noise get_4d_scaled_noise get_4d_scaled_noise get_4d_scaled_noise
ridge_1d ridge_1d ridge_1d ridge_1d
ridge_1d_f64 ridge_1d_f64 ridge_1d_f64 ridge_1d_f64
ridge_2d ridge_2d ridge_2d ridge_2d
ridge_2d_f64 ridge_2d_f64 ridge_2d_f64 ridge_2d_f64
ridge_3d ridge_3d ridge_3d ridge_3d
ridge_3d_f64 ridge_3d_f64 ridge_3d_f64 ridge_3d_f64
ridge_4d ridge_4d ridge_4d ridge_4d
ridge_4d_f64 ridge_4d_f64 ridge_4d_f64 ridge_4d_f64
simplex_1d simplex_1d simplex_1d simplex_1d
simplex_1d_f64 simplex_1d_f64 simplex_1d_f64 simplex_1d_f64
simplex_2d simplex_2d simplex_2d simplex_2d
simplex_2d_f64 simplex_2d_f64 simplex_2d_f64 simplex_2d_f64
simplex_3d simplex_3d simplex_3d simplex_3d
simplex_3d_f64 simplex_3d_f64 simplex_3d_f64 simplex_3d_f64
simplex_4d simplex_4d simplex_4d simplex_4d
simplex_4d_f64 simplex_4d_f64 simplex_4d_f64 simplex_4d_f64
turbulence_1d turbulence_1d turbulence_1d turbulence_1d
turbulence_1d_f64 turbulence_1d_f64 turbulence_1d_f64 turbulence_1d_f64
turbulence_2d turbulence_2d turbulence_2d turbulence_2d
turbulence_2d_f64 turbulence_2d_f64 turbulence_2d_f64 turbulence_2d_f64
turbulence_3d turbulence_3d turbulence_3d turbulence_3d
turbulence_3d_f64 turbulence_3d_f64 turbulence_3d_f64 turbulence_3d_f64
turbulence_4d turbulence_4d turbulence_4d turbulence_4d
turbulence_4d_f64 turbulence_4d_f64 turbulence_4d_f64 turbulence_4d_f64

It seems to me that we can close this issue. @TheHellBox & @Ralith Ok?

verpeteren commented 1 year ago

I'll close this issue now; feel free to reopen with details of what is still missing. (I'll fix some more issues the upcoming weeks and then prepare a new crate release)