Closed TheHellBox closed 1 year ago
Hey there, no fundamental reason why you can't do f64 noise here. The speedup due to simd is cut in half but will still be there. PRs are welcome!
However I do wonder if you really are going to need it. Like are you modeling the geometry of earth down to 1 meter scale or something? If you are attempting to do that globally, how are you going to render that much data, and store it, and so on?
Normally such a prospect would be done in chunks. When viewing the planet from afar you only need noise at very large scale, when up close you need to zoom in the detail but you don't need the whole planet any more. I would think you could just layer f32 noise and that might be more practical than using f64 noise to model the whole thing at once?
Thanks for the answer!
Normally such a prospect would be done in chunks. When viewing the planet from afar you only need noise at very large scale, when up close you need to zoom in the detail but you don't need the whole planet any more. I would think you could just layer f32 noise and that might be more practical than using f64 noise to model the whole thing at once?
Yes, the data is stored in chunks. But the problem is different, it's is about coordinates precision. The sphere has a huge radius(2/10 of the earth scale in my case, or 1275620m), so when I want to sample a point on it, the f32 precision for a point coordinates is not enough.
If you take a f64 point on a planet, for example
x: 1039566.7102027689 y: 682214.653570567 z: -284764.1232537858
the f32 representation of this point rounds to
x: 1039566.7 y: 682214.6 z: -284764.13
When samples are close to each other, this precision error becomes very noticeable, especially with sensitive stuff like normal maps(See example below, the error is in those black/white dots, those are the broken normals)
And this is how it's supposed to look like(Captured when I was using noise-rs)
Hey there, no fundamental reason why you can't do f64 noise here.
So it's possible to pass f64 values with current library functions? If so, that's great! But what about scalar
implementation? Sadly it only allows for f32 coordinates.
I can make a PR and add f64 functionality to scalar if you wish. Should it be a new set of functions, like simdnoise::scalar::fbm_3d_f64
, or there is a better way of doing that?
My code is mostly a reimplementation of Ralith's planetmap, I still haven't published my code anywhere, but his code is very similar (https://github.com/Ralith/planetmap), so if you need more of the details, you can check how it works, everything is very well commented. And sorry for a poor Engilsh if that's a problem
You would need to write all new 64 bit functions for everything. A lot of grunt work, largely copy-pasta.
I think your precision issue could be solved by this common approach to solar system scale games: https://www.gamasutra.com/blogs/RobertWooden/20150413/240938/Overcoming_Floating_Point_Precision_Errors.php
This won't work for surface sampling unfortunately.
All of those techniques only work for graphics, and in my case, I just round f64s transform matrices into f32s before sending onto gpu. OpenGL's camera position is always a 0, so precision is not a problem after the conversion. (See: https://github.com/Ralith/planetmap/blob/master/src/chunk.rs#L147) That's not possible with most of the game engines as they use f32s for transforms, so this is why people are using such workaround, but as my engine is self-written, and I can just store everything in f64s with no need for workarounds.
Sampling is different, if you want to get a sample on a very specific point, you can only pass this point's coordinate, and this is why precision matters here
The fundamental problem here is that a noise function is needed which can give accurate results for a spherical domain with extremely high precision, because the player will often be viewing an extremely small portion of the sphere and it's desirable to have significant texture variation on the scale of the player's view. Layering doesn't help because you still need that single high-precision domain; any noise whose domain doesn't gracefully embed a sphere will have artifacts or discontinuities.
Also, the main reason @TheHellBox is having trouble with normal maps here is because he's computing normals by finite difference method. An analytic approach would buy another order of magnitude or two of precision, which might be enough.
cool, well if anyone wants to submit a PR to add f64 support I'm happy to help guide the process. mostly a task of copy-pasta the existing functions and changing the types, there will likely be a few spots you have to adjust some of the simd stuff which I can help with. Don't think it can quite be made generic though proof of concepts of that would be interesting.
About that, I've tried to look at what I can do. It seems like set1_epi64
is missing from simdeez(While setzero_epi64
is there), not sure if that's intended or not
we can fix that, we have the technology
set1_epi64 added to simdeez
Thanks (: Now there are a bunch of other things missing, there is a short list of what I've found so far:
cvtpd_epi64
cvtepi64_pd
add_epi64
i64gather_epi64
sub_epi64
added!
Thanks! I've implemented simplex3d, it works just fine, now it's about copy-pasting all the other stuff
There are also those functions missing
mullo_epi64
fast_floor_pd
fast_floor_pd
is used in simplex_1d
mullo_epi64
is used in cellular
cool, on it
ok done. keep in mind that mullo_epi64 is going to be a bit slow, as its just doing it in scalar fashion. This can maybe be sped up a bit with some trickery like so: https://stackoverflow.com/questions/17863411/sse-multiplication-of-2-64-bit-integers but I want to take some time with that. For now just wanted to get it working. Also not clear the trickery will be faster on all/any cpus.
Got your changes merged, plan to test a bit to see what needs fixing. Still some work to do to make 64 bit versions of all the helper functions.
As I said in a pull request, avx2 has quite a lot of artifacts on bigger scale, and sse2 seem to not work at all. So those are the things you might want to look at first. I have a limited understanding of how SIMD actually works, so I could make some mistakes there
could be some bugs in some of the functions I added to simdeez too, I'll get it sorted. is your test case in the PR? if not can you post a snippet here maybe?
I've tested the thing inside my engine, the code there looks like
Self::Fbm { lac, gain, octaves } => unsafe {
if is_x86_feature_detected!("avx2") {
let x = _mm_set1_pd(x);
let y = _mm_set1_pd(y);
let z = _mm_set1_pd(z);
let lac = _mm_set1_pd(*lac);
let gain = _mm_set1_pd(*gain);
let s = simdeez::sse2::F64x2(simdnoise::sse2::fbm_3d_f64(x, y, z, lac, gain, *octaves, seed as i64));
let mut r: f64 = 0.0;
simdeez::sse2::Sse2::storeu_pd(&mut r, s);
r
}
else{
simdnoise::scalar::fbm_3d_f64(x, y, z, *lac, *gain, *octaves, seed as i64)
}
}
I can publish the code for the whole engine, I'm planning to do it anyway, but I idk if it'll be any usefull for you
that snippet is perfect, that's all I need, thanks.
As of #29, finite differences are no longer needed, at least in principle and when using the low-level primitives. Should we revisit the inclusion of f64
, given the significant maintenance hazard (e.g. #28) of having multiple almost-identical copies of the same code?
Maybe some macros could be used to avoid code duplication for the f64
case? This case is very useful for large scale terrain generation. I'm happy it is included now.
This case is very useful for large scale terrain generation.
Can you elaborate? I'm currently working on using the f32 version to generate a 1:1 earth-scale planet and I'm not having any precision issues.
@Ralith Did you see the message that started this thread?
Yes, I am personally familiar with @TheHellBox's use case. As discussed mid-thread, f64
was only required due to the use of finite difference methods to approximate derivatives, which is now no longer necessary, at least when using the primitives directly.
It seems that:
f64s
functions.I just ran a check and here is the listing of all 32 and 64 functions:
scalar | sse2 | sse4.1 | avx2 |
---|---|---|---|
cellular_2d | cellular_2d | cellular_2d | cellular_2d |
cellular_2d_f64 | cellular_2d_f64 | cellular_2d_f64 | cellular_2d_f64 |
cellular_3d | cellular_3d | cellular_3d | cellular_3d |
cellular_3d_f64 | cellular_3d_f64 | cellular_3d_f64 | cellular_3d_f64 |
fbm_1d | fbm_1d | fbm_1d | fbm_1d |
fbm_1d_f64 | fbm_1d_f64 | fbm_1d_f64 | fbm_1d_f64 |
fbm_2d | fbm_2d | fbm_2d | fbm_2d |
fbm_2d_f64 | fbm_2d_f64 | fbm_2d_f64 | fbm_2d_f64 |
fbm_3d | fbm_3d | fbm_3d | fbm_3d |
fbm_3d_f64 | fbm_3d_f64 | fbm_3d_f64 | fbm_3d_f64 |
fbm_4d | fbm_4d | fbm_4d | fbm_4d |
fbm_4d_f64 | fbm_4d_f64 | fbm_4d_f64 | fbm_4d_f64 |
get_1d_noise | get_1d_noise | get_1d_noise | get_1d_noise |
get_1d_scaled_noise | get_1d_scaled_noise | get_1d_scaled_noise | get_1d_scaled_noise |
get_2d_noise | get_2d_noise | get_2d_noise | get_2d_noise |
get_2d_scaled_noise | get_2d_scaled_noise | get_2d_scaled_noise | get_2d_scaled_noise |
get_3d_noise | get_3d_noise | get_3d_noise | get_3d_noise |
get_3d_scaled_noise | get_3d_scaled_noise | get_3d_scaled_noise | get_3d_scaled_noise |
get_4d_noise | get_4d_noise | get_4d_noise | get_4d_noise |
get_4d_scaled_noise | get_4d_scaled_noise | get_4d_scaled_noise | get_4d_scaled_noise |
ridge_1d | ridge_1d | ridge_1d | ridge_1d |
ridge_1d_f64 | ridge_1d_f64 | ridge_1d_f64 | ridge_1d_f64 |
ridge_2d | ridge_2d | ridge_2d | ridge_2d |
ridge_2d_f64 | ridge_2d_f64 | ridge_2d_f64 | ridge_2d_f64 |
ridge_3d | ridge_3d | ridge_3d | ridge_3d |
ridge_3d_f64 | ridge_3d_f64 | ridge_3d_f64 | ridge_3d_f64 |
ridge_4d | ridge_4d | ridge_4d | ridge_4d |
ridge_4d_f64 | ridge_4d_f64 | ridge_4d_f64 | ridge_4d_f64 |
simplex_1d | simplex_1d | simplex_1d | simplex_1d |
simplex_1d_f64 | simplex_1d_f64 | simplex_1d_f64 | simplex_1d_f64 |
simplex_2d | simplex_2d | simplex_2d | simplex_2d |
simplex_2d_f64 | simplex_2d_f64 | simplex_2d_f64 | simplex_2d_f64 |
simplex_3d | simplex_3d | simplex_3d | simplex_3d |
simplex_3d_f64 | simplex_3d_f64 | simplex_3d_f64 | simplex_3d_f64 |
simplex_4d | simplex_4d | simplex_4d | simplex_4d |
simplex_4d_f64 | simplex_4d_f64 | simplex_4d_f64 | simplex_4d_f64 |
turbulence_1d | turbulence_1d | turbulence_1d | turbulence_1d |
turbulence_1d_f64 | turbulence_1d_f64 | turbulence_1d_f64 | turbulence_1d_f64 |
turbulence_2d | turbulence_2d | turbulence_2d | turbulence_2d |
turbulence_2d_f64 | turbulence_2d_f64 | turbulence_2d_f64 | turbulence_2d_f64 |
turbulence_3d | turbulence_3d | turbulence_3d | turbulence_3d |
turbulence_3d_f64 | turbulence_3d_f64 | turbulence_3d_f64 | turbulence_3d_f64 |
turbulence_4d | turbulence_4d | turbulence_4d | turbulence_4d |
turbulence_4d_f64 | turbulence_4d_f64 | turbulence_4d_f64 | turbulence_4d_f64 |
It seems to me that we can close this issue. @TheHellBox & @Ralith Ok?
I'll close this issue now; feel free to reopen with details of what is still missing. (I'll fix some more issues the upcoming weeks and then prepare a new crate release)
Hi! I'm using the library for the procedural planet surface generation, and the scales are huge(Up to the earth size). f32s precision is not enough there unfortunately, so having a support for f64s would be great.
I have no idea what are the limitations for simd though, but I think it's possible to implement f64s for scalar.
Thanks for the great library!