lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.77k stars 691 forks source link

Illegal hardware instruction (core dumped) when installing via conda-forge #559

Closed SimonCW closed 3 years ago

SimonCW commented 4 years ago

Issue: I get the error Illegal hardware instruction (core dumped) when installing from conda forge. Installing with pip is working fine. To reproduce build an environment with conda create -n test -c conda-forge python=3.7 lightfm, then activate conda activate test start a python interpreter and try to import lightfm

As an fyi, I also opened an issue on the feedstock: https://github.com/conda-forge/lightfm-feedstock/issues/7


Environment (conda list):

``` $ conda list # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge brotlipy 0.7.0 py37h8f50634_1000 conda-forge ca-certificates 2020.6.20 hecda079_0 conda-forge certifi 2020.6.20 py37hc8dfbb8_0 conda-forge cffi 1.14.3 py37h2b28604_0 conda-forge chardet 3.0.4 py37hc8dfbb8_1007 conda-forge cryptography 3.1.1 py37hb09aad4_0 conda-forge idna 2.10 pyh9f0ad1d_0 conda-forge ld_impl_linux-64 2.35 h769bd43_9 conda-forge libblas 3.8.0 17_openblas conda-forge libcblas 3.8.0 17_openblas conda-forge libffi 3.2.1 he1b5a44_1007 conda-forge libgcc-ng 9.3.0 h24d8f2e_16 conda-forge libgfortran-ng 7.5.0 hdf63c60_16 conda-forge libgomp 9.3.0 h24d8f2e_16 conda-forge liblapack 3.8.0 17_openblas conda-forge libopenblas 0.3.10 pthreads_hb3c22a3_4 conda-forge libstdcxx-ng 9.3.0 hdf63c60_16 conda-forge lightfm 1.15 py37h161383b_1002 conda-forge ncurses 6.2 he1b5a44_1 conda-forge numpy 1.19.1 py37h7ea13bd_2 conda-forge openssl 1.1.1h h516909a_0 conda-forge pip 20.2.3 py_0 conda-forge pycparser 2.20 pyh9f0ad1d_2 conda-forge pyopenssl 19.1.0 py_1 conda-forge pysocks 1.7.1 py37hc8dfbb8_1 conda-forge python 3.7.8 h6f2ec95_1_cpython conda-forge python_abi 3.7 1_cp37m conda-forge readline 8.0 he28a2e2_2 conda-forge requests 2.24.0 pyh9f0ad1d_0 conda-forge scipy 1.5.2 py37hb14ef9d_0 conda-forge setuptools 49.6.0 py37hc8dfbb8_1 conda-forge six 1.15.0 pyh9f0ad1d_0 conda-forge sqlite 3.33.0 h4cf870e_0 conda-forge tk 8.6.10 hed695b0_0 conda-forge urllib3 1.25.10 py_0 conda-forge wheel 0.35.1 pyh9f0ad1d_0 conda-forge xz 5.2.5 h516909a_1 conda-forge zlib 1.2.11 h516909a_1009 conda-forge ```


Details about conda and system ( conda info ):

``` $ conda info active environment : test3_persrec active env location : /home/simon/miniconda3/envs/test3_persrec shell level : 4 user config file : /home/simon/.condarc populated config files : /home/simon/.condarc conda version : 4.8.4 conda-build version : not installed python version : 3.7.6.final.0 virtual packages : __glibc=2.27 base environment : /home/simon/miniconda3 (writable) channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /home/simon/miniconda3/pkgs /home/simon/.conda/pkgs envs directories : /home/simon/miniconda3/envs /home/simon/.conda/envs platform : linux-64 user-agent : conda/4.8.4 requests/2.24.0 CPython/3.7.6 Linux/5.4.0-48-generic linuxmint/19.3 glibc/2.27 UID:GID : 1000:1000 netrc file : None offline mode : False ```
SimonCW commented 4 years ago

Temporarily solved by switching to the cf202003 label of the conda-forge channel, i.e. using conda-forge/label/cf202003 as channel with channel priority set to strict. However, as far as I understand it, this means that we are restricted to the packages and dependencies with the cf202003 label.

maciejkula commented 4 years ago

This is likely because the machine on which the conda wheels are compiled has a different set of instructions available than the machine you're trying to run on.

There is some allowance for this here but clearly there are still cases where this fails.

SimonCW commented 4 years ago

Hi @maciejkula, thank you for your reply! I suspected something along those lines.

However, we are using pretty standard machines: AWS Batch for production and CI/CD and Lenovo Thinkpad Carbon X1 6th Gen for some Dev work. Here the cpuinfo for the Thinkpad:

processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz stepping : 10 microcode : 0xd6 cpu MHz : 2400.003 cache size : 8192 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 3999.93 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz stepping : 10 microcode : 0xd6 cpu MHz : 2400.022 cache size : 8192 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 3999.93 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz stepping : 10 microcode : 0xd6 cpu MHz : 2400.009 cache size : 8192 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 3999.93 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz stepping : 10 microcode : 0xd6 cpu MHz : 2400.073 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 3999.93 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz stepping : 10 microcode : 0xd6 cpu MHz : 2400.014 cache size : 8192 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 3999.93 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz stepping : 10 microcode : 0xd6 cpu MHz : 2400.004 cache size : 8192 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 3999.93 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz stepping : 10 microcode : 0xd6 cpu MHz : 2400.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 3999.93 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz stepping : 10 microcode : 0xd6 cpu MHz : 2400.022 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 3999.93 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management:

I feel that these are neihter very old nor exotic, and hence, I would expect the provided wheels on conda-forge to work for them.

Although I realize that this is not an issue of the LightFM source code, do you have any hints on how to dig little deeper into the bug to find out which dependency / instruction is causing the illegal hardware instruction error?

Also, do you happen to know what changed in conda / conda-forge? I didn't find any changes indicating a change in their infrastruructure or compiler toolchain (unfortunately, I am not very proficient with the whole conda package creation toolchain). Heck, I didn't even find a good description of the cf202003 label (what exactly is frozen, when is this executed, etc).

tonyhammainen commented 4 years ago

fyi, my problem is not related to conda-forge, but figured I'd continue this thread since I suspect the root cause is the same.

I am getting the same error, Illegal instruction (core dumped), when trying to use LightFM().fit().

This happens if: Dockerfile:

RUN pip3 install lightfm==1.15
CMD python3 script_that_runs_lightfm.py

i.e. building the container on AWS CodeBuild and executing LightFM().fit() when the container is run on AWS ECS

However, if I move the the installation of lightfm to the runtime environment, i.e., Dockerfile: CMD pip3 install lightfm==1.15 && python3 script_that_runs_lightfm.py - everything works.

So, it seems like lightfm installation is very picky with regard to the installation environment (cpu?), never had any other library acting like this between AWS resources.

Any suggestions on how to avoid having to use this anti-pattern are welcome!

SimonCW commented 4 years ago

I check on a brand new Laptop and have the same problem. Running with PYTHONFAULTHANDLER=1 I get the error below. Maybe that helps @maciejkula?

(py37) ➜  debug_lightfm export PYTHONFAULTHANDLER=1
(py37) ➜  debug_lightfm python lfm_get_started.py
Fatal Python error: Illegal instruction

Current thread 0x00007f709d65a740 (most recent call first):
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1043 in create_module
  File "<frozen importlib._bootstrap>", line 583 in module_from_spec
  File "<frozen importlib._bootstrap>", line 670 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/home/simon/miniconda3/envs/py37/lib/python3.7/site-packages/lightfm/_lightfm_fast.py", line 3 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/home/simon/miniconda3/envs/py37/lib/python3.7/site-packages/lightfm/lightfm.py", line 8 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "/home/simon/miniconda3/envs/py37/lib/python3.7/site-packages/lightfm/__init__.py", line 4 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 728 in exec_module
  File "<frozen importlib._bootstrap>", line 677 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 967 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 983 in _find_and_load
  File "lfm_get_started.py", line 1 in <module>
[1]    26992 illegal hardware instruction  python lfm_get_started.py

Could you point me towards the best place to address this? The conda-forge Gitter?

xhochy commented 3 years ago

The machines used for conda-forge builds are using what Azure Pipelines provides and have quite a decent / broad set of instructions. Having a short look into the setup.py, it seems setting LIGHTFM_NO_CFLAGS in the feedstock build would solve by using conda-forge's standard set of CFLAGS.

SimonCW commented 3 years ago

I added a PR in the feedstock @maciejkula.

Thanks a lot @xhochy!

maciejkula commented 3 years ago

Thanks for figuring this out - changing the Conda build parameters sounds great.

SimonCW commented 3 years ago

Im closing this since it was solved by: https://github.com/lyst/lightfm/pull/563, https://github.com/conda-forge/lightfm-feedstock/pull/9, and https://github.com/conda-forge/lightfm-feedstock/pull/11

Installing from conda-forge should work now (I had to do conda clean --all --force-pkgs-dirs but I think I had a few messed up intermediate versions in cache).