Implement `_customIndexOfEquatableElement` for `UnsafeRawBufferPointer`

swiftlang / swift

The Swift Programming Language

https://swift.org

Apache License 2.0

67.57k stars 10.36k forks source link

Implement `_customIndexOfEquatableElement` for `UnsafeRawBufferPointer` #63200

Open glessard opened 1 year ago

glessard commented 1 year ago

UnsafeRawBufferPointer and UnsafeMutableRawBufferPointer perform element-by-element scans in firstIndex(of: Element) and lastIndex(of: Element). It appears that these types should be amenable to some improvements, by using vectorized loops.

Motivation UnsafeRawBufferPointer.firstIndex(of:) might be faster and/or more efficient with a specialized implementation.

Solution Benchmarks should be created, and then implementations of the Collection customization points _customIndexOfEquatableElement() and _customLastIndexOfEquatableElement() should be investigated.

Alternatives considered If no substantial improvements are found, then drop it. That fact should be documented so that someone else doesn't repeat this work in the future.

glessard commented 1 year ago

@valeriyvan you can assign yourself this issue if you want; I couldn't do it myself.

valeriyvan commented 1 year ago

@valeriyvan you can assign yourself this issue if you want; I couldn't do it myself.

I cannot assign myself to this issue also.

glessard commented 1 year ago

It occurred to a colleague and I that platforms usually have well-tuned implementations of the forward version of this algorithm in the form of the memchr function from libc. memchr is an expected dependency for the swift build, so we can always use it. It would be more economical in effort and code size to call that from _customIndexOfEquatableElement(). (The LLVM repository contains hand-tuned assembly versions contributed by ARM.)

The reverse algorithm is a different story, but we can gauge tuning success against a forward version that forwards to memchr.

valeriyvan commented 1 year ago

It occurred to a colleague and I that platforms usually have well-tuned implementations of the forward version of this algorithm in the form of the memchr function from libc. memchr is an expected dependency for the swift build, so we can always use it. It would be more economical in effort and code size to call that from _customIndexOfEquatableElement(). (The LLVM repository contains hand-tuned assembly versions contributed by ARM.)

I thought usage of libc is discouraged. Is there any guidelines what and when could be used from libc? What about intrinsics? Could intrinsics be used?

The reverse algorithm is a different story, but we can gauge tuning success against a forward version that forwards to memchr.

What about memrchr?

tbkka commented 1 year ago

I thought usage of libc is discouraged. Is there any guidelines what and when could be used from libc? What about intrinsics? Could intrinsics be used?

It is strongly discouraged for everyone except the Swift standard library.

glessard commented 1 year ago

What about memrchr?

memrchr isn't part of the approved/necessary set, which can be found in utils/check_freestanding_dependencies.py. It also isn't part of the C standard, which would make it an unreliable dependency.

glessard commented 1 year ago

Re-opening, since getting the new benchmark merged is only the first step.