spotify / annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Apache License 2.0
13.1k stars 1.16k forks source link

Initializing an AnnoyIndex crashes on AMD processors #472

Open convoliution opened 4 years ago

convoliution commented 4 years ago

I was trying to get a pip-installed Annoy v1.16.3 running on an m5a.large AWS Instance, but it crashed with Illegal instruction (core dumped) trying to execute

annoy.AnnoyIndex(512, "angular")

This doesn't occur on Intel-based m5.large Instances.

Doing some digging (details below), it seems to be because Annoy requires the AVX-512 instruction set, which isn't available on AMD processors.


m5a.large's AMD EPYC 7000 series processor supports these instruction sets (from /proc/cpuinfo):

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save

The Annoy library seems to use these instruction sets (using this utility):

    MODE64 (call)
    AVX512 (vmovdqa64)
    VLX (vmovdqa64)
    AVX2 (vextracti128)
    AVX (vmovups)
    NOVLX (vmovups)
    CMOV (cmovns)
CPU Generation: Unknown
erikbern commented 4 years ago

Did you build it from source?

convoliution commented 4 years ago

I did not!

erikbern commented 4 years ago

ok, how did you install it then? if it was compiled elsewhere then it probably explains it

conradoverta commented 4 years ago

We build in a different instance than we run, which might end up in a different instance type due to our mix. Is there an option to relax the compilation so that we don't limit to the architecture it was compiled in? Something like using mtune instead of march. I'm not familiar with annoy's compilation process.

erikbern commented 4 years ago

Seems likely that would work. Maybe you can add something to setup.py that lets you control the flag using an environment variable or similar?

conradogarciaberrotaran commented 4 years ago

Any news on this? im having the same issue.

erikbern commented 4 years ago

sounds like maybe moving the -march=native flag should do the trick. can you try that and let me know? i might remove it by default if it's causing issues for many people

conradogarciaberrotaran commented 4 years ago

I don't know about that flag, how should i pass it on to the installer?

conradoverta commented 4 years ago

We created a wrapper around gcc/g++ to replace march to mtune. Just updating the setup.py in this repo wouldn't be enough for our case since we need to support existing versions of annoy.

erikbern commented 4 years ago

i'll take a look at changing march to mtune in setup.py. just want to make sure there's no major performance implication.

conradogarciaberrotaran commented 4 years ago

As a temporary fix, i ended up installing it on the same instance i use it, instead of on a different instance (where my docker image was being built).

conradoverta commented 4 years ago

I would recommend leaving mtune/march as an option if possible (maybe an environment flag?). I see value of locking with march in cases where you want to optimize for performance vs mtune where you might be optimizing for other characteristics such as availability. march would be a good choice for a lot of dev environments, while mtune could be necessary for production to spread into multiple machine pools in a heterogenous cluster (our case).

erikbern commented 4 years ago

you can actually set the ANNOY_COMPILER_ARGS environment variable to whatever flags you want! it's not documented. see https://github.com/spotify/annoy/blob/master/setup.py

conradoverta commented 4 years ago

Oh! That is indeed a much better solution. Somehow I completely missed it when I skimmed the code.

conradogarciaberrotaran commented 4 years ago

Oh! That is indeed a much better solution. Somehow I completely missed it when I skimmed the code.

Can you share how you'd install annoy to avoid this issue?

bikegirl commented 4 years ago

This is from setup.py:

# Not all CPUs have march as a tuning parameter                                                                   
1 cputune = ['-march=native',]                                                                                      
2 if platform.machine() == 'ppc64le':                                                                               
3     extra_compile_args += ['-mcpu=native',]                                                                       
4                                                                                                                  
5 if platform.machine() == 'x86_64':                                                                                6 
6     extra_compile_args += cputune                                                                                 
7                                                                                                                  
8 if os.name != 'nt':                                                                                               
9     extra_compile_args += ['-O3', '-ffast-math', '-fno-associative-math']

Do I comment lines 1-6 or just delete the variable inside the brackets?

Also, I tried to run setup.py and there are command line arguments? I think they are required, so I can't get past the argparser. Do we have to provide those command line arguments?

...update: So since I last wrote, I cloned repo, commented lines 1-6 for now in setup.py, cd into annoy source folder and did pip install ., gonna see if I can run the rest of my scripts and build the index without getting and illegal instruction error

erikbern commented 4 years ago

Ok, thanks for experimenting with it! I'm going to see if there's a better solution for this. Enough people have been running into issues for me to think maybe I should consider a different approach. I might give it a shot to just use Eigen for dot products, or something similar.

bikegirl commented 4 years ago

Okay, so in case this helps anyone else, steps I took to by-pass the 'illegal instruction' error:

  1. activate your virtenv (source ~/ann/bin/activate)
  2. git clone annoy repo
  3. `cd annoy/'
  4. `vim setup.py' Inside setup.py I comment out lines 1-6 from the file (NOTE: line #'s here do not correspond to the line #'s in the actual setup.py file I just did it below for ease of understanding)
    # Not all CPUs have march as a tuning parameter                                                                   
    1 cputune = ['-march=native',]                                                                                      
    2 if platform.machine() == 'ppc64le':                                                                               
    3     extra_compile_args += ['-mcpu=native',]                                                                       
    4                                                                                                                  
    5 if platform.machine() == 'x86_64':                                                                                6 
    6     extra_compile_args += cputune                                                                                 
    7                                                                                                                  
    8 if os.name != 'nt':                                                                                               
    9     extra_compile_args += ['-O3', '-ffast-math', '-fno-associative-math']
  5. :wq
  6. pip install .
  7. delete repo you don't need it anymore (ensure that you did get a confirmation message that it did install though)

I used .build_on_disk() and was able to build an annoy index successfully. Will keep you updated if something goes wrong.

erikbern commented 4 years ago

Great! I'll investigate if there's some way to fix the underlying issue

mmaybeno commented 4 years ago

This may be the problem I have been seeing on our setup as well. Everything is built in docker images, so originally I wasn't sure why I was getting failures. And since Annoy is integrated into our sagemaker pipeline, failures were some what opaque.

Plus, images built in our CI/CD pipeline were the only instances which would cause the error, but local docker builds would be fine. I'll experiment with these flags to see if the issue is resolved. Side note, the only reason why I guess it was a problem with annoy was comparing the two docker images with https://github.com/GoogleContainerTools/container-diff.

mmaybeno commented 4 years ago

For anyone that has issues with this and builds annoy in docker, I used this line to override the ANNOY_COMPILER_ARGS environment variable per @erikbern's suggestion. Thanks!

ENV ANNOY_COMPILER_ARGS -D_CRT_SECURE_NO_WARNINGS,-DANNOYLIB_MULTITHREADED_BUILD,-mtune=native

erikbern commented 4 years ago

If anyone wants to give it a shot and try to fix this, I think the best approach is something like

  1. Make the distance function calculation a template argument of AnnoyIndex (but not AnnoyIndexInterface)
  2. At the point of instantiation of the index https://github.com/spotify/annoy/blob/master/src/annoymodule.cc#L153 – add (a) and #ifdef that checks if AVX is supported (b) checks if (__builtin_cpu_supports("avx2")) in runtime

Maybe there's other solutions too!

bikegirl commented 4 years ago

Yes, I know for a fact our University's cluster of CPUs does not support avx2, only regular avx, and I think this is where that error stems from also (I'm assuming that differentiation between avx2 and avx is also significant).

bikegirl commented 4 years ago

These are some other erros I have come accross along side trying to find a workaround for illegal instruction:

1)RuntimeError: Didn't find engine for operation quantized::linear_prepack NoQEngine (operator() at /pytorch/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp:202)

2)RuntimeError: Your CPU does not support FBGEMM

"FBGEMM requires gcc 5+ and a CPU with support for avx2 instruction set or higher." -->> from googling this issue

My guess would be that you need to disable FBGEMM, at least when you run on architectures that don't support avx2. There appears to be a USE_FBGEMM environment variable which can be set to "0" to disable the use of FBGEMM.

erikbern commented 4 years ago

are those last ones related to Annoy? seems like it says something about pytorch

bikegirl commented 4 years ago

I remember experiencing it as one of the many errors during the illegal instruction issue I was having, but you're right... this might not be particular to that and maybe more to do my virtenv. Not sure, thought I'd mention it.

italo1983 commented 3 years ago

For anyone that has issues with this and builds annoy in docker, I used this line to override the ANNOY_COMPILER_ARGS environment variable per @erikbern's suggestion. Thanks!

ENV ANNOY_COMPILER_ARGS -D_CRT_SECURE_NO_WARNINGS,-DANNOYLIB_MULTITHREADED_BUILD,-mtune=native

Yeahhhhhhh! You saved my mental health 😃 Solution Working on AWS ECS

gary-x0pa commented 2 years ago

Hello may i know the exact line that you used to install annoy in docker? Not sure how to pass this argument into my dockerfile

mmaybeno commented 2 years ago

Hello may i know the exact line that you used to install annoy in docker? Not sure how to pass this argument into my dockerfile

It has been a few years, and there is probably a better way to do this now. But originally included these in my dockerfile

FROM python:3.7.5-slim-stretch

ENV ANNOY_COMPILER_ARGS -D_CRT_SECURE_NO_WARNINGS,-DANNOYLIB_MULTITHREADED_BUILD,-mtune=native

From there you can include anything else in your dockerfile including installing annoy.

ankit-db commented 1 year ago

@mmaybeno is there a better way to do this now?

mmaybeno commented 1 year ago

Good question. It has been over a year since I built this. Are you encountering the same issue? I'm surprised this issue is still open to be honest :).

ankit-db commented 1 year ago

We are 😞 - we're doing Docker builds on an ubuntu linux environment that I suspect is using Intel CPUs and observing that using that image on AMD CPUs on a different AWS instance results in core dumped when initializing the index. Your fix here does work though - thanks for that! It's just a bit unfortunate that there's no better way... (btw, what led you to figure out -mtune=native)

mmaybeno commented 1 year ago

Well bummer. It was not me, mostly everyone else in the thread: https://github.com/spotify/annoy/issues/472#issuecomment-624603252 I just highlighted how one can build it in docker using the env vars they highlighted.

CristhianBoujon commented 4 months ago

Hey @erikbern, Quiao126 merged the definitive solution to master. Could you release a new version? I would really appreciate it since we are experiencing the same issue. Unfortunately, we can't access to the dockerfile for adding env var you suggested.

drazvan commented 3 months ago

In my case, I had to use the following:

ENV ANNOY_COMPILER_ARGS="-D_CRT_SECURE_NO_WARNINGS,-DANNOYLIB_MULTITHREADED_BUILD,-march=x86-64"

Notice the -march=x86-64. This is because the machine on which the image was built (as part of CI) had Intel CPU, whereas the deployment/testing machine had AMD CPU.