lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.8k stars 208 forks source link

import lancedb fails with illegal instruction on older intel CPU #2195

Open eware-godaddy opened 5 months ago

eware-godaddy commented 5 months ago

When doing a pip install on python 3.9, 3.10, 3.11 on my older Intel CPU when I do import lancedb python dumps core with an illegal instruction error.

This is on ubuntu 20.04.

$uname -a
Linux myserver 5.4.0-176-generic #196-Ubuntu SMP Fri Mar 22 16:46:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Details from /proc/cpuinfo below:

$ lscpu
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      36 bits physical, 48 bits virtual
CPU(s):                             8
On-line CPU(s) list:                0-7
Thread(s) per core:                 2
Core(s) per socket:                 4
Socket(s):                          1
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              42
Model name:                         Intel(R) Core(TM) i7-2635QM CPU @ 2.00GHz
Stepping:                           7
CPU MHz:                            798.204
CPU max MHz:                        2900.0000
CPU min MHz:                        800.0000
BogoMIPS:                           3991.15
Virtualization:                     VT-x
L1d cache:                          128 KiB
L1i cache:                          128 KiB
L2 cache:                           1 MiB
L3 cache:                           6 MiB
NUMA node0 CPU(s):                  0-7
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        KVM: Mitigation: Split huge pages
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX conditional cache flushes, SM
                                    T vulnerable
Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Unknown: No mitigations
Vulnerability Retbleed:             Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and
                                    seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sani
                                    tization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP con
                                    ditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmo
                                    v pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe sys
                                    call nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_goo
                                    d nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
                                    dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid
                                    sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx
                                     lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriori
                                    ty ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_
                                    l1d

the core dump:

$python3 -c 'import lancedb'
Illegal instruction (core dumped)

after some gdb:


(gdb) x/10i 0x00007ffff4d6ccd0
=> 0x7ffff4d6ccd0:      vpbroadcastq 0x1d07b8f(%rip),%xmm3        # 0x7ffff6a74868
   0x7ffff4d6ccd9:      movzbl -0x1(%rsi,%r14,1),%edx
   0x7ffff4d6ccdf:      cmp    $0x10,%rax
   0x7ffff4d6cce3:      jae    0x7ffff4d6cd06
   0x7ffff4d6cce5:      cmp    %r15,%rbp
   0x7ffff4d6cce8:      je     0x7ffff4d6ce2b
   0x7ffff4d6ccee:      xor    %edi,%edi
   0x7ffff4d6ccf0:      cmp    %dl,(%rcx,%rdi,1)
   0x7ffff4d6ccf3:      je     0x7ffff4d6cdc2
   0x7ffff4d6ccf9:      inc    %rdi

Looks like a lack of AVX2

eware-godaddy commented 5 months ago

Related issues:

wjones127 commented 5 months ago

@eware-godaddy As an user of older CPUs, what have you seen are the best ways libraries have made available compatible binaries for these users? We'd prefer a way that didn't hurt functionality for the majority of users with CPUs from the most recent 10 years.

eddyxu commented 5 months ago

@eware-godaddy we built lance with minimal cpu target haswell, that has avx2 and fma, which we used for vectorization in vector search.

One quick way to use lance on old CPU is build from source by removing this line https://github.com/lancedb/lance/blob/main/.cargo/config.toml#L31

We also found that building for newer instruction set (esp after skylake / icelake) can bring significant speed up. Similarity, Redhat, ubuntu are testing x86-x64-v3 / v4 in the next LTSs as well. We dont have a conclusion about how do we want to achieve that yet.

drahoslavzan commented 4 months ago

@eware-godaddy we built lance with minimal cpu target haswell, that has avx2 and fma, which we used for vectorization in vector search.

One quick way to use lance on old CPU is build from source by removing this line https://github.com/lancedb/lance/blob/main/.cargo/config.toml#L31

We also found that building for newer instruction set (esp after skylake / icelake) can bring significant speed up. Similarity, Redhat, ubuntu are testing x86-x64-v3 / v4 in the next LTSs as well. We dont have a conclusion about how do we want to achieve that yet.

I tried the approach you suggested; I commented out the line with optimizations, but when I try to build the project I get a bunch of errors such as:

error[E0599]: no method named `optimize_indices` found for struct `DatasetWriteGuard<'_>` in the current scope
    --> /Users/dzan/.cargo/git/checkouts/lancedb-daf1dd3257b225ca/c0ea44d/rust/lancedb/src/table.rs:1063:14
     |
1060 | /         self.dataset
1061 | |             .get_mut()
1062 | |             .await?
1063 | |             .optimize_indices(options)
     | |             -^^^^^^^^^^^^^^^^ method not found in `DatasetWriteGuard<'_>`
     | |_____________|
     | 
     |
    ::: /Users/dzan/.cargo/git/checkouts/lancedb-daf1dd3257b225ca/c0ea44d/rust/lancedb/src/table/dataset.rs:254:1
     |
254  |   pub struct DatasetWriteGuard<'a> {
     |   -------------------------------- method `optimize_indices` not found for this struct
     |
     = help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
     |
17   + use lance_index::traits::DatasetIndexExt;
     |

error[E0599]: no method named `load_indices` found for struct `DatasetReadGuard<'_>` in the current scope
    --> /Users/dzan/.cargo/git/checkouts/lancedb-daf1dd3257b225ca/c0ea44d/rust/lancedb/src/table.rs:1177:56
     |
1177 |         let (indices, mf) = futures::try_join!(dataset.load_indices(), dataset.latest_manifest())?;
     |                                                        ^^^^^^^^^^^^ method not found in `DatasetReadGuard<'_>`
     |
    ::: /Users/dzan/.cargo/git/checkouts/lancedb-daf1dd3257b225ca/c0ea44d/rust/lancedb/src/table/dataset.rs:239:1
     |
239  | pub struct DatasetReadGuard<'a> {
     | ------------------------------- method `load_indices` not found for this struct
     |
     = help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
     |
17   + use lance_index::traits::DatasetIndexExt;

...

I am quite new to Rust so I am not sure if I am doing something wrong, or those optimizations are somehow necessary.

Projoke commented 2 months ago

@eware-godaddy How did you finally solve this problem?