robbmcleod / pyfastnoisesimd

Python module wrapping C++ FastNoiseSIMD
BSD 3-Clause "New" or "Revised" License
39 stars 6 forks source link

PyFastNoiseSIMD

PyFastNoiseSIMD is a wrapper around Jordan Peck's synthetic noise library https://github.com/Auburns/FastNoise-SIMD which has been accelerated with SIMD instruction sets. It may be installed via pip:

pip install pyfastnoisesimd

Parallelism is further enhanced by the use of concurrent.futures to multi-thread the generation of noise for large arrays. Thread scaling is generally in the range of 50-90 %, depending largely on the vectorized instruction set used. The number of threads, defaults to the number of virtual cores on the system. The ideal number of threads is typically the number of physical cores, irrespective of Intel Hyperthreading®.

Here is a simple example to generate Perlin-style noise on a 3D rectilinear grid::

import pyfastnoisesimd as fns
import numpy as np
shape = [512, 512, 512]
seed = np.random.randint(2**31)
N_threads = 4

perlin = fns.Noise(seed=seed, numWorkers=N_threads)
perlin.frequency = 0.02
perlin.noiseType = fns.NoiseType.Perlin
perlin.fractal.octaves = 4
perlin.fractal.lacunarity = 2.1
perlin.fractal.gain = 0.45
perlin.perturb.perturbType = fns.PerturbType.NoPerturb
result = perlin.genAsGrid(shape)

where result is a 3D numpy.ndarray of dtype 'float32'. Alternatively, the user can provide coordinates, which is helpful for tasks such as custom bump-mapping a tessellated surface, via Noise.getFromCoords(coords).

More extensive examples are found in the examples folder on the Github repository.

Parallelism is further enhanced by the use of concurrent.futures to multi-thread the generation of noise for large arrays. Thread scaling is generally in the range of 50-90 %, depending largely on the vectorized instruction set used. The number of threads, defaults to the number of virtual cores on the system. The ideal number of threads is typically the number of physical cores, irrespective of Intel Hyperthreading®.

Documentation

Check it out at:

http://pyfastnoisesimd.readthedocs.io

Installation

pyfastnoisesimd is available on PyPI, and may be installed via pip::

pip install --upgrade pip
pip install --upgrade setuptools
pip install -v pyfastnoisesimd

Wheels are provided for Windows, Mac OSX, and Linux via manywheels.

Building from source on Windows will require the appropriate MS Visual Studio version for your version of Python, or MSVC Build Tools:

http://landinghub.visualstudio.com/visual-cpp-build-tools

On Linux or OSX, only a source distribution is provided and installation requires gcc or clang. For AVX512 support with GCC, GCC7.2+ is required, lower versions will compile with AVX2/SSE4.1/SSE2 support only. GCC earlier than 4.7 disables AVX2 as well. Note that pip does not respect the $CC environment variable, so to clone and build from source with gcc-7:

git clone https://github.com/robbmcleod/pyfastnoisesimd.git
alias gcc=gcc-7; alias g++=g++-7
pip install -v ./pyfastnoisesimd

Installing GCC7.2 on Ubuntu (with sudo or as root)::

add-apt-repository ppa:ubuntu-toolchain-r/test
apt update
apt install gcc-7 g++-7

Benchmarks


Generally speaking thread scaling is higher on machines with SSE4 support only, as most CPUs throttle clock speed down to limit heat generation with AVX2. As such, AVX2 is only about 1.5x faster than SSE4 whereas on a pure SIMD instruction length basis (4 versus 8) you would expect it to be x2 faster.

The first test is used the default mode, a cubic grid, Noise.genAsGrid(), from examples\gridded_noise.py:

Array shape: [8,1024,1024] CPU: Intel i7-7820X Skylake-X (8 cores, 3.6 GHz), Windows 7 SIMD level supported: AVX2 & FMA3

Single-threaded mode

Computed 8388608 voxels cellular noise in 0.298 s
    35.5 ns/voxel
Computed 8388608 voxels Perlin noise in 0.054 s
    6.4 ns/voxel

Multi-threaded (8 threads) mode

Computed 8388608 voxels cellular noise in 0.044 s 5.2 ns/voxel 685.0 % thread scaling Computed 8388608 voxels Perlin noise in 0.013 s 1.5 ns/voxel 431.3 % thread scaling

The alternative mode is Noise.getFromCoords() where the user provides the coordinates in Cartesian-space, from examples\GallPeters_projection.py:

Single threaded mode

Generated noise from 2666000 coordinates with 1 workers in 1.766e-02 s
    6.6 ns/pixel

Multi-threaded (4 threads) mode

Generated noise from 2666000 coordinates with 4 workers in 6.161e-03 s 2.3 ns/pixel 286.6 % thread scaling

Release Notes

0.4.3

0.4.2

0.4.1

0.4.0

0.3.2

0.3.1

0.3.0

0.2.1

0.2.0

0.1.5

0.1.4

0.1.2

FastNoiseSIMD library

If you want a more direct interface with the underlying library you may use the pyfastsimd._ext module, which is a function-for-function mapping to the C++ code.

FastNoiseSIMD is implemented by Jordan Peck, and may be found at:

https://github.com/Auburns/FastNoiseSIMD

It aims to provide faster performance through the use of intrinsic(SIMD) CPU functions. Vectorisation of the code allows noise functions to process data in sets of 4/8/16 increasing performance by 700% in some cases (Simplex).

See the Wiki for usage information on the noise types:

https://github.com/Auburns/FastNoiseSIMD/wiki

Download links for a GUI-based reference noise generator may be found at:

https://github.com/Auburns/FastNoiseSIMD/releases

Authors

Robert A. McLeod wrote the Python wrapper, implemented multi-threading, and wrote the documentation.

Elliott Sales de Andrade contributed a number of fixes to allow building to succeed on many platforms.

Jordan Peck wrote the underlying library FastNoiseSIMD.