robbmcleod / pyfastnoisesimd

Python module wrapping C++ FastNoiseSIMD
BSD 3-Clause "New" or "Revised" License
39 stars 6 forks source link

Aligned Memory Allocation disabled on Win32 #19

Closed robbmcleod closed 5 years ago

robbmcleod commented 5 years ago

The Windows function _aligned_malloc has to have a counterpart _aligned_free or it seg-faults. There is no way to pass ownership of the memory to NumPy because it calls free on array destruction.

Past discussion by Python-dev and NumPy on the subject of aligned memory:

numpy/numpy#5312

https://bugs.python.org/issue18835

It looks like the issues have gone stale.

Presently we are not using aligned memory for Win32 which likely has a performance penalty for that platform.

robbmcleod commented 5 years ago

Now on Windows with unaligned allocation, calls to genFromCoords are faulting.

robbmcleod commented 5 years ago

I created a new branch aligned_mem. This is a very nettlesome problem. Trying to subclass numpy arrays leads to issues when slicing and performing similar array operations.

I also tried simply allocating memory as bytes in NumPy arrays, slicing to get the correct memory start address, and then using .view(np.float32) to cast it. But this continuously general prot faults on _mm256_store_ps.

At some point I may simply have to turn off the use of aligned memory in the FastNoiseSIMD library on Windows.

robbmcleod commented 5 years ago

Ok, progress in https://github.com/robbmcleod/pyfastnoisesimd/commit/60d7d5a62f0c354fb8d31747906017e5d8d0a957. The general protection fault comes from _mm*_store_ps which requires a much longer alignment: the full vector length. So 32 bytes for AVX2, for example.

The chunking for multi-threaded obviously needs to also provide chunks that are aligned on the same spacing.

robbmcleod commented 5 years ago

Branch aligned_mem is re-merged back into master and released 0.4.0, so closing.