Open convoliution opened 4 years ago
Did you build it from source?
I did not!
ok, how did you install it then? if it was compiled elsewhere then it probably explains it
We build in a different instance than we run, which might end up in a different instance type due to our mix. Is there an option to relax the compilation so that we don't limit to the architecture it was compiled in? Something like using mtune instead of march. I'm not familiar with annoy's compilation process.
Seems likely that would work. Maybe you can add something to setup.py that lets you control the flag using an environment variable or similar?
Any news on this? im having the same issue.
sounds like maybe moving the -march=native
flag should do the trick. can you try that and let me know? i might remove it by default if it's causing issues for many people
I don't know about that flag, how should i pass it on to the installer?
We created a wrapper around gcc/g++ to replace march
to mtune
. Just updating the setup.py
in this repo wouldn't be enough for our case since we need to support existing versions of annoy.
i'll take a look at changing march to mtune in setup.py. just want to make sure there's no major performance implication.
As a temporary fix, i ended up installing it on the same instance i use it, instead of on a different instance (where my docker image was being built).
I would recommend leaving mtune/march as an option if possible (maybe an environment flag?). I see value of locking with march in cases where you want to optimize for performance vs mtune where you might be optimizing for other characteristics such as availability. march
would be a good choice for a lot of dev environments, while mtune
could be necessary for production to spread into multiple machine pools in a heterogenous cluster (our case).
you can actually set the ANNOY_COMPILER_ARGS
environment variable to whatever flags you want! it's not documented. see https://github.com/spotify/annoy/blob/master/setup.py
Oh! That is indeed a much better solution. Somehow I completely missed it when I skimmed the code.
Oh! That is indeed a much better solution. Somehow I completely missed it when I skimmed the code.
Can you share how you'd install annoy to avoid this issue?
This is from setup.py:
# Not all CPUs have march as a tuning parameter
1 cputune = ['-march=native',]
2 if platform.machine() == 'ppc64le':
3 extra_compile_args += ['-mcpu=native',]
4
5 if platform.machine() == 'x86_64': 6
6 extra_compile_args += cputune
7
8 if os.name != 'nt':
9 extra_compile_args += ['-O3', '-ffast-math', '-fno-associative-math']
Do I comment lines 1-6 or just delete the variable inside the brackets?
Also, I tried to run setup.py and there are command line arguments? I think they are required, so I can't get past the argparser. Do we have to provide those command line arguments?
...update: So since I last wrote, I cloned repo, commented lines 1-6 for now in setup.py
, cd into annoy source folder
and did pip install .
, gonna see if I can run the rest of my scripts and build the index without getting and illegal instruction error
Ok, thanks for experimenting with it! I'm going to see if there's a better solution for this. Enough people have been running into issues for me to think maybe I should consider a different approach. I might give it a shot to just use Eigen for dot products, or something similar.
Okay, so in case this helps anyone else, steps I took to by-pass the 'illegal instruction' error:
source ~/ann/bin/activate
)# Not all CPUs have march as a tuning parameter
1 cputune = ['-march=native',]
2 if platform.machine() == 'ppc64le':
3 extra_compile_args += ['-mcpu=native',]
4
5 if platform.machine() == 'x86_64': 6
6 extra_compile_args += cputune
7
8 if os.name != 'nt':
9 extra_compile_args += ['-O3', '-ffast-math', '-fno-associative-math']
:wq
pip install .
I used .build_on_disk() and was able to build an annoy index successfully. Will keep you updated if something goes wrong.
Great! I'll investigate if there's some way to fix the underlying issue
This may be the problem I have been seeing on our setup as well. Everything is built in docker images, so originally I wasn't sure why I was getting failures. And since Annoy is integrated into our sagemaker pipeline, failures were some what opaque.
Plus, images built in our CI/CD pipeline were the only instances which would cause the error, but local docker builds would be fine. I'll experiment with these flags to see if the issue is resolved. Side note, the only reason why I guess it was a problem with annoy was comparing the two docker images with https://github.com/GoogleContainerTools/container-diff.
For anyone that has issues with this and builds annoy in docker, I used this line to override the ANNOY_COMPILER_ARGS
environment variable per @erikbern's suggestion. Thanks!
ENV ANNOY_COMPILER_ARGS -D_CRT_SECURE_NO_WARNINGS,-DANNOYLIB_MULTITHREADED_BUILD,-mtune=native
If anyone wants to give it a shot and try to fix this, I think the best approach is something like
AnnoyIndex
(but not AnnoyIndexInterface
)if (__builtin_cpu_supports("avx2"))
in runtimeMaybe there's other solutions too!
Yes, I know for a fact our University's cluster of CPUs does not support avx2, only regular avx, and I think this is where that error stems from also (I'm assuming that differentiation between avx2 and avx is also significant).
These are some other erros I have come accross along side trying to find a workaround for illegal instruction:
1)RuntimeError: Didn't find engine for operation quantized::linear_prepack NoQEngine (operator() at /pytorch/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp:202)
2)RuntimeError: Your CPU does not support FBGEMM
"FBGEMM requires gcc 5+ and a CPU with support for avx2 instruction set or higher." -->> from googling this issue
My guess would be that you need to disable FBGEMM, at least when you run on architectures that don't support avx2. There appears to be a USE_FBGEMM environment variable which can be set to "0" to disable the use of FBGEMM.
are those last ones related to Annoy? seems like it says something about pytorch
I remember experiencing it as one of the many errors during the illegal instruction issue I was having, but you're right... this might not be particular to that and maybe more to do my virtenv. Not sure, thought I'd mention it.
For anyone that has issues with this and builds annoy in docker, I used this line to override the
ANNOY_COMPILER_ARGS
environment variable per @erikbern's suggestion. Thanks!
ENV ANNOY_COMPILER_ARGS -D_CRT_SECURE_NO_WARNINGS,-DANNOYLIB_MULTITHREADED_BUILD,-mtune=native
Yeahhhhhhh! You saved my mental health 😃 Solution Working on AWS ECS
Hello may i know the exact line that you used to install annoy in docker? Not sure how to pass this argument into my dockerfile
Hello may i know the exact line that you used to install annoy in docker? Not sure how to pass this argument into my dockerfile
It has been a few years, and there is probably a better way to do this now. But originally included these in my dockerfile
FROM python:3.7.5-slim-stretch
ENV ANNOY_COMPILER_ARGS -D_CRT_SECURE_NO_WARNINGS,-DANNOYLIB_MULTITHREADED_BUILD,-mtune=native
From there you can include anything else in your dockerfile including installing annoy.
@mmaybeno is there a better way to do this now?
Good question. It has been over a year since I built this. Are you encountering the same issue? I'm surprised this issue is still open to be honest :).
We are 😞 - we're doing Docker builds on an ubuntu linux environment that I suspect is using Intel CPUs and observing that using that image on AMD CPUs on a different AWS instance results in core dumped when initializing the index. Your fix here does work though - thanks for that! It's just a bit unfortunate that there's no better way... (btw, what led you to figure out -mtune=native
)
Well bummer. It was not me, mostly everyone else in the thread: https://github.com/spotify/annoy/issues/472#issuecomment-624603252 I just highlighted how one can build it in docker using the env vars they highlighted.
Hey @erikbern, Quiao126 merged the definitive solution to master. Could you release a new version? I would really appreciate it since we are experiencing the same issue. Unfortunately, we can't access to the dockerfile for adding env var you suggested.
In my case, I had to use the following:
ENV ANNOY_COMPILER_ARGS="-D_CRT_SECURE_NO_WARNINGS,-DANNOYLIB_MULTITHREADED_BUILD,-march=x86-64"
Notice the -march=x86-64
. This is because the machine on which the image was built (as part of CI) had Intel CPU, whereas the deployment/testing machine had AMD CPU.
I was trying to get a pip-installed Annoy v1.16.3 running on an m5a.large AWS Instance, but it crashed with
Illegal instruction (core dumped)
trying to executeThis doesn't occur on Intel-based m5.large Instances.
Doing some digging (details below), it seems to be because Annoy requires the AVX-512 instruction set, which isn't available on AMD processors.
m5a.large's AMD EPYC 7000 series processor supports these instruction sets (from
/proc/cpuinfo
):The Annoy library seems to use these instruction sets (using this utility):