Open glessard opened 1 year ago
@valeriyvan you can assign yourself this issue if you want; I couldn't do it myself.
@valeriyvan you can assign yourself this issue if you want; I couldn't do it myself.
I cannot assign myself to this issue also.
It occurred to a colleague and I that platforms usually have well-tuned implementations of the forward version of this algorithm in the form of the memchr
function from libc. memchr
is an expected dependency for the swift build, so we can always use it. It would be more economical in effort and code size to call that from _customIndexOfEquatableElement()
. (The LLVM repository contains hand-tuned assembly versions contributed by ARM.)
The reverse algorithm is a different story, but we can gauge tuning success against a forward version that forwards to memchr
.
It occurred to a colleague and I that platforms usually have well-tuned implementations of the forward version of this algorithm in the form of the
memchr
function from libc.memchr
is an expected dependency for the swift build, so we can always use it. It would be more economical in effort and code size to call that from_customIndexOfEquatableElement()
. (The LLVM repository contains hand-tuned assembly versions contributed by ARM.)
I thought usage of libc is discouraged. Is there any guidelines what and when could be used from libc? What about intrinsics? Could intrinsics be used?
The reverse algorithm is a different story, but we can gauge tuning success against a forward version that forwards to
memchr
.
What about memrchr
?
I thought usage of libc is discouraged. Is there any guidelines what and when could be used from libc? What about intrinsics? Could intrinsics be used?
It is strongly discouraged for everyone except the Swift standard library.
What about
memrchr
?
memrchr
isn't part of the approved/necessary set, which can be found in utils/check_freestanding_dependencies.py
. It also isn't part of the C standard, which would make it an unreliable dependency.
Re-opening, since getting the new benchmark merged is only the first step.
UnsafeRawBufferPointer
andUnsafeMutableRawBufferPointer
perform element-by-element scans infirstIndex(of: Element)
andlastIndex(of: Element)
. It appears that these types should be amenable to some improvements, by using vectorized loops.Motivation
UnsafeRawBufferPointer.firstIndex(of:)
might be faster and/or more efficient with a specialized implementation.Solution Benchmarks should be created, and then implementations of the
Collection
customization points_customIndexOfEquatableElement()
and_customLastIndexOfEquatableElement()
should be investigated.Alternatives considered If no substantial improvements are found, then drop it. That fact should be documented so that someone else doesn't repeat this work in the future.