Closed rhettinger closed 6 years ago
msg196194: Antoine Pitrou: "Note that the current small object allocator, if not disabled, *should* already return you aligned memory, by construction (each allocation size has dedicated pools from which memory blocks are carved)."
In my current implementation of _PyObject_AlignedAlloc(), I only call pymalloc_alloc() for alignment \<= pymalloc ALIGNMENT. Maybe we can do better?
I agree, if the two primary users say they won't use it, it doesn't make sense for us to maintain 600 hundred lines of low-level C code.
Yes, it may be better not to add it.
To summarize the problems again: --------------------------------
C11 aligned_alloc() / free() would be more comfortable but isn't available on MSVC.
posix_memalign() performance isn't that great.
hand-rolled aligned_calloc() is the fastest.
The feature could still be useful for fixing bpo-31912 and bpo-27987, *if* someone has an idea how to integrate the aligned version.
"C11 aligned_alloc() / free() would be more comfortable but isn't available on MSVC."
Is it common to require the allocated memory block to be filled with zeros? In Python, calloc() is not the most common allocator. I only found 5 places where calloc is used:
Modules/_ssl.c:5172: _ssl_locks = PyMem_Calloc(_ssl_locks_count, Modules/gcmodule.c:1689: g = (PyGC_Head *)PyObject_Calloc(1, size); Modules/gcmodule.c:1717:_PyObject_GC_Calloc(size_t basicsize) Objects/bytesobject.c:83: op = (PyBytesObject *)PyObject_Calloc(1, PyBytesObject_SIZE + size); Objects/listobject.c:176: op->ob_item = (PyObject **) PyMem_Calloc(size, sizeof(PyObject *));
"posix_memalign() performance isn't that great. hand-rolled aligned_calloc() is the fastest."
I'm not sure that the cost of the memory allocator itself defeats the gain of aligned memory on algorithms. I expect data processing to be much more expensive than the memory allocation, no?
Again, the unknown remains the benchmark result.
Can anyone write a quick benchmark to measure the gain of aligned memory? 4 years ago, Raymond Hettinger wanted to use it for the set and collection.deque types.
About the API itself, I'm not sure that PyMem_AlignedAlloc(alignment, size) is flexible enough. If we want to get *data* aligned in a Python object, we would have to pass an offset to the data, since Python objects have headers of variable size (depending on the type).
Windows has such API:
void * _aligned_offset_malloc(
size_t size,
size_t alignment,
size_t offset
);
This function is based on malloc, so likely adds padding bytes for you depending on size, alignment and offset.
https://msdn.microsoft.com/fr-fr/library/ec852tkw.aspx
See bpo-27987: "obmalloc's 8-byte alignment causes undefined behavior".
I'm not sure that the cost of the memory allocator itself defeats the gain of aligned memory on algorithms. I expect data processing to be much more expensive than the memory allocation, no?
I guess this issue isn't easy to focus due to the vast variety of use cases. So the is only about numpy/ndtypes:
What you write is true, but I'm simply getting cold feet w.r.t locking myself into memset(). calloc() uses mmap() for large allocations, so I think one can happily allocate a large number of huge arrays without any cost on Linux, as long as they're not accessed.
At least that's what my tests indicate.
Couple that with the fact that one has to use aligned_free() anyway, and that posix_memalign() isn't that great, and the use case seems less solid **for scientific computing**.
So I rather waste a couple of bytes per allocation and deal with some Valgrind macros to get proper bounds checking.
Note that CPython still knows any allocation from ndtypes, because ndt_callocfunc will be set to PyMem_Calloc() [1] and the custom ndt_aligned_calloc() uses ndt_callocfunc.
[1] If bpo-31912 is solved.
Because of the lack of strong motivition compared to the size of the change and the cost to maintain it, I reject this issue. For the archives, I attach a copy of the latest version of my implementation.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['interpreter-core', 'type-feature', '3.7']
title = 'Add PyMem_AlignedAlloc()'
updated_at =
user = 'https://github.com/rhettinger'
```
bugs.python.org fields:
```python
activity =
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date =
closer = 'vstinner'
components = ['Interpreter Core']
creation =
creator = 'rhettinger'
dependencies = []
files = ['31543', '47285']
hgrepos = []
issue_num = 18835
keywords = ['patch']
message_count = 57.0
messages = ['196174', '196176', '196194', '196196', '196202', '196693', '196700', '196713', '196828', '196834', '217229', '232201', '232209', '232215', '232219', '232221', '232222', '232223', '232225', '234086', '286074', '304803', '304805', '304807', '304826', '304834', '304837', '304843', '304844', '304859', '304878', '304989', '305003', '305009', '305012', '305114', '305122', '305136', '305160', '305177', '305301', '305302', '305304', '305308', '305310', '305311', '305327', '305331', '305339', '305344', '305345', '305366', '305397', '305417', '305418', '305466', '306864']
nosy_count = 11.0
nosy_names = ['tim.peters', 'rhettinger', 'pitrou', 'vstinner', 'benjamin.peterson', 'trent', 'njs', 'skrah', 'neologix', 'xdegaye', 'wscullin']
pr_nums = ['4089', '4200', '15333']
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue18835'
versions = ['Python 3.7']
```