microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.53k stars 4.28k forks source link

Official installation guide results in non-working install on non-ancient Linux systems. #3392

Open Ark-kun opened 6 years ago

Ark-kun commented 6 years ago

The documentation tells me: https://docs.microsoft.com/en-us/cognitive-toolkit/setup-linux-python?tabs=cntkpy251

CNTK requires OpenMPI 1.10.x to be installed on your system. On Ubuntu 16.04 install it like this:

sudo apt-get install openmpi-bin

This results in a broken installation:

$ sudo apt-get install openmpi-bin
...
Setting up openmpi-bin (2.1.1-7) ...
update-alternatives: using /usr/bin/mpirun.openmpi to provide /usr/bin/mpirun (mpirun) in auto mode
Processing triggers for libc-bin (2.24-12) ...

$ python3 -c 'import cntk as c;print(c.__version__)'
Traceback (most recent call last):
  File "/usr/local/google/home/avolkov/.local/lib/python3.5/site-packages/cntk/cntk_py.py", line 18, in swig_import_helper
    return importlib.import_module(mname)
  File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 956, in _find_and_load_unlocked
ImportError: No module named 'cntk._cntk_py'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/google/home/avolkov/.local/lib/python3.5/site-packages/cntk/__init__.py", line 17, in <module>
    from . import cntk_py
  File "/usr/local/google/home/avolkov/.local/lib/python3.5/site-packages/cntk/cntk_py.py", line 21, in <module>
    _cntk_py = swig_import_helper()
  File "/usr/local/google/home/avolkov/.local/lib/python3.5/site-packages/cntk/cntk_py.py", line 20, in swig_import_helper
    return importlib.import_module('_cntk_py')
  File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libmpi_cxx.so.1: cannot open shared object file: No such file or directory
thiagocrepaldi commented 6 years ago

Hello Ark-kun, could you run and print the result of the following command ? It will return all the mpi entries on your library search path ldconfig -p | grep -i mpi

If you could, also execute the following command to determine which version of openmpi you have on your system: dpkg -s openmpi-bin

Ark-kun commented 6 years ago

@thiagocrepaldi Please note, that I know how to solve this problem. But not everyone can.

openmpy 1.x is no longer in the packages since Ubuntu 17. This might be unfortunate, but this is our reality.

$ ldconfig -p | grep -i mpi
    libompitrace.so.20 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libompitrace.so.20
    libnvidia-ptxjitcompiler.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
    libnvidia-ptxjitcompiler.so.1 (libc6) => /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.1
    libmpi_usempif08.so.20 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmpi_usempif08.so.20
    libmpi_usempi_ignore_tkr.so.20 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmpi_usempi_ignore_tkr.so.20
    libmpi_mpifh.so.20 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmpi_mpifh.so.20
    libmpi_java.so.20 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmpi_java.so.20
    libmpi_cxx.so.20 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmpi_cxx.so.20
    libmpi.so.20 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmpi.so.20
    libexempi.so.3 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libexempi.so.3

$ dpkg -s openmpi-bin
Package: openmpi-bin
Status: deinstall ok config-files
Priority: extra
Section: net
Installed-Size: 447
Maintainer: Alastair McKinstry <mckinstry@debian.org>
Architecture: amd64
Source: openmpi
Version: 2.1.1-7
Config-Version: 2.1.1-7
Depends: libc6 (>= 2.8), libhwloc5 (>= 1.11.8), libopenmpi2, openmpi-common (= 2.1.1-7)
Suggests: gfortran
Conflicts: openmpi-bin
Conffiles:
 /etc/openmpi/openmpi-default-hostfile ef9b3fad0bd8bcb7bdbafe1881f068d3
 /etc/openmpi/openmpi-mca-params.conf 6bc5e3a70d815f1ea27cb9a030b0fcec
 /etc/openmpi/openmpi-totalview.tcl c04f4ef3c7ef59feb19ced2acf0a258b
Description: high performance message passing library -- binaries
 Open MPI is a project combining technologies and resources from several other
 projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best
 MPI library available. A completely new MPI-3.1 compliant implementation, Open
 MPI offers advantages for system and software vendors, application developers
 and computer science researchers.
 .
 Features:
  * Full MPI-3.1 standards conformance
  * Thread safety and concurrency
  * Dynamic process spawning
  * High performance on all platforms
  * Reliable and fast job management
  * Network and process fault tolerance
  * Support network heterogeneity
  * Single library supports all networks
  * Run-time instrumentation
  * Many job schedulers supported
  * Internationalized error messages
  * Component-based design, documented APIs
 .
 This package contains the Open MPI utility programs.
Homepage: http://www.open-mpi.org/
saurbhc commented 2 years ago

Any update on this? I'm getting the same error