zeam-vm / pelemay

Pelemay is a native compiler for Elixir, which generates SIMD instructions. It has a plan to generate for GPU code.
Apache License 2.0
186 stars 13 forks source link

Create String.replace SIMD function #98

Closed zacky1972 closed 4 years ago

zacky1972 commented 4 years ago

I guess to implement String.replace/4 using SIMD instructions will be as follows:

  1. Allocate a memory area for replaced string because memory area in Elixir should be immutable.
    • Keep the same size to the original memory area if the length of the pattern is larger than that of the replacement.
    • Otherwise, keep some larger size than the original memory area.
  2. Copy the original memory area into the replaced string memory area until the pattern is matched to the original memory area or the end of the area is reached, with auto-vectorization and SIMD instructions.
  3. Copy the rest of the original area into the replaced area if the length of the rest is smaller than the pattern with auto-vectorization and SIMD instructions, and Finish.
  4. Match the whole of the pattern and the original area with auto-vectorization and SIMD instructions
    • If matched, confirm and reallocate if necessary the size of the rest of the replaced string area to have an enough capacity, and copy the replacement into the replaced string area with auto-vectorization and SIMD instructions.
    • If not, copy the unmatched original area into the replaced string area with auto-vectorization and SIMD instructions.
  5. Go back 2

This issue is a part of #49

branch name is string_replace.

zacky1972 commented 4 years ago

MacBook Air

StringReplaceBench

benchmark name iterations average time StringReplace.replace 10000000 0.35 µs/op String.replace 1000000 1.26 µs/op

SIMD version is 3.6x faster than original String.replace!!!

zacky1972 commented 4 years ago

iMac Pro

## StringReplaceBench
benchmark name         iterations   average time 
StringReplace.replace    10000000   0.20 µs/op
String.replace           10000000   0.67 µs/op

3.35x faster!

piacerex commented 4 years ago

Windows 10 (Intel CORE m3: 2 Pysical cores / 4 Logical cores)

## LogisticMapBench
benchmark name          iterations   average time
Pelemay                       1000   1438.00 ツオs/op
Enum                           500   4626.00 ツオs/op
Flow                           200   7890.00 ツオs/op
## StringReplaceBench
benchmark name          iterations   average time
Pelemay String.replace     1000000   1.58 ツオs/op
Enum String.replace         500000   4.62 ツオs/op
Flow String.replace           5000   662.40 ツオs/op

2.92x faster (This was close to mac result)

I'll try data preprocessing with Esuna using Pelemay :-)