unum-cloud / usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
https://unum-cloud.github.io/usearch/
Apache License 2.0
2.27k stars 143 forks source link

Bug: Rust build does not use simsimd (`index.hardware_acceleration()` reports `serial`) #421

Closed jrcavani closed 3 months ago

jrcavani commented 6 months ago

Describe the bug

I am getting serial as the acceleration.

If I installed in Python through pip, it's good:

--> python -c 'from usearch.index import Index; print(Index(ndim=512, metric="ip", dtype="f16").hardware_acceleration)'
haswell

--> python -c 'from usearch.index import Index; print(Index(ndim=512, metric="l2sq", dtype="f32").hardware_acceleration)'
skylake

I've been able to find the relevant build code paths:

This is the line that prints serial: https://github.com/unum-cloud/usearch/blob/5ea48c87c56a25ab57634a8f207f80ae675ed58e/include/usearch/index_plugins.hpp#L1492

This is the line that decides including an env var USEARCH_USE_SIMSIMD in build.rs when simsimd feature is turned on: https://github.com/unum-cloud/usearch/blob/5ea48c87c56a25ab57634a8f207f80ae675ed58e/build.rs#L28

I was able to ensure the build script runs that code block (by writing some log file to disk in build.rs), but I am not seeing any change to the call index.hardware_acceleration().

I tried to built usearch manually:

--> USEARCH_USE_SIMSIMD=1 RUST_LOG=info cargo build -r --package usearch --manifest-path ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/usearch-2.12.0/Cargo.toml --target-dir target/
   Compiling usearch v2.12.0 (/home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/usearch-2.12.0)
warning: usearch@2.12.0: In file included from include/usearch/index_dense.hpp:16,
warning: usearch@2.12.0:                  from rust/lib.hpp:10,
warning: usearch@2.12.0:                  from /target/release/build/usearch-5cd0aef1eab6fb5c/out/cxxbridge/sources/usearch/rust/lib.rs.cc:1:
warning: usearch@2.12.0: include/usearch/index_plugins.hpp:52: warning: "SIMSIMD_DYNAMIC_DISPATCH" redefined
warning: usearch@2.12.0:    52 | #define SIMSIMD_DYNAMIC_DISPATCH 0
warning: usearch@2.12.0:       |
warning: usearch@2.12.0: <command-line>: note: this is the location of the previous definition
warning: usearch@2.12.0: In file included from include/usearch/index_dense.hpp:16,
warning: usearch@2.12.0:                  from rust/lib.hpp:10,
warning: usearch@2.12.0:                  from rust/lib.cpp:1:
warning: usearch@2.12.0: include/usearch/index_plugins.hpp:52: warning: "SIMSIMD_DYNAMIC_DISPATCH" redefined
warning: usearch@2.12.0:    52 | #define SIMSIMD_DYNAMIC_DISPATCH 0
warning: usearch@2.12.0:       |
warning: usearch@2.12.0: <command-line>: note: this is the location of the previous definition
    Finished `release` profile [optimized] target(s) in 9.53s

Does this mean something?

Steps to reproduce

This is the feature list in Cargo.toml:

[package]
name = "assigner"
version = "0.1.0"
edition = "2021"

[dependencies]

anyhow = "1.0"
async-channel = "2.1.1"
clap = { version = "4.4.10", features = ["derive"] }
env_logger = "0.11.0"
indicatif = { version = "0.17.7", features = ["tokio"] }
log = "0.4.20"
rayon = "1.8.1"
usearch = { version = "2.12.0", features = ["simsimd", "fp16lib"] }
itertools = "0.13.0"
half = "2.4.1"
simsimd = "4.3.1"

This is the source code for the test:

use clap::Parser;
use half::f16;
use itertools::Itertools;
use std::fs::File;
use std::io::{BufReader, Read};
use log::info;

use anyhow::Result;
use usearch::{new_index, Index, IndexOptions, MetricKind, ScalarKind};

#[derive(Parser)]
struct Cli {
    /// Input path
    #[clap(long)]
    bin_path: String,
}

fn main() -> Result<()> {
    env_logger::init();
    let args = Cli::parse();

    let options = IndexOptions {
        dimensions: 512,               // necessary for most metric kinds
        metric: MetricKind::IP,        // or MetricKind::L2sq, MetricKind::Cos ...
        quantization: ScalarKind::F32, // or ScalarKind::F32, ScalarKind::I8, ScalarKind::B1x8 ...
        connectivity: 32,              // zero for auto
        expansion_add: 128,            // zero for auto
        expansion_search: 128,         // zero for auto
        multi: false,
    };

    let index: Index = new_index(&options)?;

    assert!(index.reserve(100).is_ok());
    assert!(index.capacity() >= 100);
    assert!(index.connectivity() != 0);
    assert_eq!(index.dimensions(), 512);
    assert_eq!(index.size(), 0);

    let reader = BufReader::new(File::open(&args.bin_path)?);

    let mut some_vecs = vec![];
    for (idx, v) in reader
        .bytes()
        .chunks(2)
        .into_iter()
        .map(|chunk| {
            let chunk = chunk.try_collect::<u8, Vec<u8>, _>().unwrap();
            f16::from_le_bytes(chunk.try_into().unwrap()).to_f32()
        })
        .chunks(512)
        .into_iter()
        .enumerate()
        .take(100)
    {
        let v = v.collect::<Vec<_>>();
        some_vecs.push((idx as u64, v));
    }

    for (idx, v) in some_vecs.iter() {
        index.add(*idx, v.as_slice())?;
    }

    assert_eq!(index.size(), 100);

    // Read back the tags
    let mut results = index.search(&some_vecs[0].1, 10).unwrap();
    assert_eq!(results.keys.len(), 10);
    info!("Results keys: {:?}", results.keys);
    results.distances.iter_mut().for_each(|x| *x = 1f32 - *x);
    info!("Results vals: {:?}", results.distances);

    info!("Hardware acceleration: {}", index.hardware_acceleration());
    info!("Memory usage: {}", index.memory_usage());
    Ok(())
}

I am getting serial as the acceleration. Is this right? This applies to f32, f16 and i8.

[2024-05-24T02:47:06Z INFO  test_usearch] Results keys: [0, 29, 32, 79, 52, 58, 15, 20, 42, 61]
[2024-05-24T02:47:06Z INFO  test_usearch] Results vals: [0.99995106, 0.10014129, 0.08763337, 0.07960814, 0.075494945, 0.07314652, 0.07152343, 0.068568826, 0.060367405, 0.056860268]
[2024-05-24T02:47:06Z INFO  test_usearch] Hardware acceleration: serial
[2024-05-24T02:47:06Z INFO  test_usearch] Memory usage: 20980448
--> apt list gcc
Listing... Done
gcc/jammy,now 4:11.2.0-1ubuntu1 amd64 [installed,automatic]

--> rustc --version
rustc 1.78.0 (9b00956e5 2024-04-29)

--> cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"

Expected behavior

SIMD acceleration is expected.

USearch version

2.12.0

Operating System

Ubuntu 22.04

Hardware architecture

x86

Which interface are you using?

Other bindings

Contact Details

No response

Are you open to being tagged as a contributor?

Is there an existing issue for this?

Code of Conduct

ashvardanian commented 6 months ago

Hi @jrcavani! That is definitely a bug. The warning isn't telling us much, especially as it redefines the macro with the same value. Can you also report the lscpu outputs, to make sure your CPU supports all the right features?

jrcavani commented 6 months ago

The macro that got overwritten was originally specified to be SIMSIMD_DYNAMIC_DISPATCH=1

    if cfg!(feature = "simsimd") {
        build
            .define("USEARCH_USE_SIMSIMD", "1")
            .define("SIMSIMD_DYNAMIC_DISPATCH", "1")
            .define("SIMSIMD_NATIVE_F16", "0");
    } else {

But it was redefined to be 1

warning: usearch@2.12.0:    52 | #define SIMSIMD_DYNAMIC_DISPATCH 0

Again, I haven't looked closely enough to know which one was an env var, and which one was a compile time constant, and how they interact between build.rs and C++ code.

Here is the lscpu output:

--> lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  64
  On-line CPU(s) list:   0-63
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
    CPU family:          6
    Model:               106
    Thread(s) per core:  2
    Core(s) per socket:  32
    Socket(s):           1
    Stepping:            6
    BogoMIPS:            5799.96
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonst
                         op_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnow
                         prefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni
                         avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd ida arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid md_clear flus
                         h_l1d arch_capabilities
Virtualization features:
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):
  L1d:                   1.5 MiB (32 instances)
  L1i:                   1 MiB (32 instances)
  L2:                    40 MiB (32 instances)
  L3:                    54 MiB (1 instance)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-63
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT Host state unknown
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                 Not affected
  Tsx async abort:       Not affected
ashvardanian commented 3 months ago

Hey @jrcavani, is this still the case?

jrcavani commented 3 months ago

Yes. Just upgraded both simsimd and usearch:

usearch = { version = "2.13.2", features = ["simsimd", "fp16lib"] }
simsimd = "5.0.0"
[2024-08-13T04:55:48Z INFO  test_usearch] Hardware acceleration: serial

simsimd prints expected flags:

use simsimd::capabilities;
use simsimd::ComplexProducts;
use simsimd::SpatialSimilarity;

fn main() {
    let vector_a: Vec<f32> = vec![1.0, 2.0, 3.0, 4.0];
    let vector_b: Vec<f32> = vec![5.0, 6.0, 7.0, 8.0];

    // Compute the inner product between vector_a and vector_b
    let inner_product =
        SpatialSimilarity::dot(&vector_a, &vector_b).expect("Vectors must be of the same length");

    println!("Inner Product: {}", inner_product);

    // Compute the complex inner product between complex_vector_a and complex_vector_b
    let complex_inner_product =
        ComplexProducts::dot(&vector_a, &vector_b).expect("Vectors must be of the same length");

    let complex_conjugate_inner_product =
        ComplexProducts::vdot(&vector_a, &vector_b).expect("Vectors must be of the same length");

    println!("Complex Inner Product: {:?}", complex_inner_product); // -18, 69
    println!(
        "Complex C. Inner Product: {:?}",
        complex_conjugate_inner_product
    ); // 70, -8

    println!("uses neon: {}", capabilities::uses_neon());
    println!("uses sve: {}", capabilities::uses_sve());
    println!("uses haswell: {}", capabilities::uses_haswell());
    println!("uses skylake: {}", capabilities::uses_skylake());
    println!("uses ice: {}", capabilities::uses_ice());
    println!("uses sapphire: {}", capabilities::uses_sapphire());
}
Inner Product: 70
Complex Inner Product: (-18.0, 68.0)
Complex C. Inner Product: (70.0, -8.0)
uses neon: false
uses sve: false
uses haswell: true
uses skylake: true
uses ice: true
uses sapphire: false
ashvardanian commented 3 months ago

So weird, the Python version prints everything correctly, so it shouldn't be coming from the core implementation:

python -c 'from usearch.index import Index; print(Index(ndim=768, metric="cos", dtype="f16").hardware_acceleration)'

I am playing around the build.rs, but don't see where the issue is coming from.

jrcavani commented 3 months ago

It did work!!

[2024-08-18T03:24:04Z INFO  test_usearch] Hardware acceleration: skylake