Open wuhuhu800 opened 6 years ago
GPU 配额申请
实例 建议n1-highmem-4
https://cloud.google.com/compute/docs/gpus/add-gpus
#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-8-0; then
# The 16.04 installer works with 16.10.
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
apt-get update
apt-get install cuda-8-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1
进入实例
命令行(前提安装gcloud命令)
https://www.imooc.com/article/22947?block_id=tuijian_wz https://medium.com/@jamsawamsa/running-a-google-cloud-gpu-for-fast-ai-for-free-5f89c707bae6
wget https://raw.githubusercontent.com/fastai/courses/master/setup/install-gpu.sh sudo sh install-gpu.sh sudo reboot
sudo modprobe nvidia nvidia-smi
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
制作镜像之后,再次升级python3
https://www.howtoing.com/how-to-install-the-anaconda-python-distribution-on-ubuntu-16-04
cd /tmp curl -O https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
我们现在可以通过SHA-256校验和通过加密散列验证来验证安装程序的数据完整性。我们将使用sha256sum命令以及脚本的文件名: sha256sum Anaconda3-5.1.0-Linux-x86_64.sh bash Anaconda3-5.1.0-Linux-x86_64.sh 一路yes
Prepending PATH=/home/sammy/anaconda3/bin to PATH in /home/sammy/.bashrc A backup will be made to: /home/sammy/.bashrc-anaconda3.bak ...
source ~/.bashrc
一旦你这样做,你可以验证你的安装通过使用conda命令,例如与list :
conda list
conda search "^python$"
sudo su
chown suncan anaconda2 chown suncan .conda
su suncan
conda search "^python$"
conda install python=3.6 报错
UnsatisfiableError: The following specifications were found to be in conflict:
- python=3.6
- ssl_match_hostname -> python[version='>=2.7,<2.8.0a0']
Use "conda info <package>" to see the dependencies for each package.
conda update anaconda 还是不行
顺便建立一个虚拟环境 conda create --name py36 python=3.6 用conda info --envs
发现是安装到anaconda2上了 #
#
最终解决方案: vim ~/.bashrc
export PATH="/home/suncan/anaconda2/bin:$PATH" 改成 export PATH="/home/suncan/anaconda3/bin:$PATH"
然后再重新加载 source ~/.bashrc 结果
此时运行 jupyter notebook --ip 0.0.0.0 --port 8888
Traceback (most recent call last):
File "/home/suncan/anaconda3/bin/jupyter-notebook", line 11, in <module>
sys.exit(main())
File "/home/suncan/anaconda3/lib/python3.6/site-packages/jupyter_core/application.py", line 266, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/home/suncan/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 657, in launch_instance
app.initialize(argv)
File "<decorator-gen-7>", line 2, in initialize
File "/home/suncan/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/home/suncan/anaconda3/lib/python3.6/site-packages/notebook/notebookapp.py", line 1501, in initialize
super(NotebookApp, self).initialize(argv)
File "<decorator-gen-6>", line 2, in initialize
File "/home/suncan/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/home/suncan/anaconda3/lib/python3.6/site-packages/jupyter_core/application.py", line 242, in initialize
self.migrate_config()
File "/home/suncan/anaconda3/lib/python3.6/site-packages/jupyter_core/application.py", line 168, in migrate_config
migrate()
File "/home/suncan/anaconda3/lib/python3.6/site-packages/jupyter_core/migrate.py", line 247, in migrate
with open(os.path.join(env['jupyter_config'], 'migrated'), 'w') as f:
PermissionError: [Errno 13] Permission denied: '/home/suncan/.jupyter/migrated'
老问题,权限不足 sudo su chown suncan .jupyter su suncan
git clone https://github.com/tensorflow/tensorflow
cd ~/tensorflow
./configure
You have bazel 0.12.0 installed.
Please specify the location of python. [Default is /home/suncan/anaconda3/bin/python]:
Found possible Python library paths:
/home/suncan/anaconda3/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/home/suncan/anaconda3/lib/python3.6/site-packages]
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: y
Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: nb
Invalid selection: nb
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.7]
Do you want to use clang as CUDA compiler? [y/N]: y
Clang will be used as CUDA compiler.
Please specify which clang should be used as device and host compiler. [Default is ]:
Invalid clang path: cannot be found.
Please specify which clang should be used as device and host compiler. [Default is ]:
Invalid clang path: cannot be found.
Please specify which clang should be used as device and host compiler. [Default is ]: ^CTraceback (most recent call last):
File "configure.py", line 81, in get_input
answer = raw_input(question)
NameError: name 'raw_input' is not defined
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "configure.py", line 1367, in <module>
main()
File "configure.py", line 1329, in main
set_clang_cuda_compiler_path(environ_cp)
File "configure.py", line 555, in set_clang_cuda_compiler_path
default_clang_path)
File "configure.py", line 539, in get_from_env_or_user_or_default
var = get_input(ask_for_var)
File "configure.py", line 83, in get_input
answer = input(question) # pylint: disable=bad-builtin
KeyboardInterrupt
suncan@deeplearning-woody3:~/tensorflow$ ./configure
You have bazel 0.12.0 installed.
Please specify the location of python. [Default is /home/suncan/anaconda3/bin/python]:
Found possible Python library paths:
/home/suncan/anaconda3/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/home/suncan/anaconda3/lib/python3.6/site-packages]
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: y
Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.7]
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Configuration finished
sudo apt-get install libcurl3 libcurl3-dev
https://blog.csdn.net/tintinetmilou/article/details/78756304 https://www.youtube.com/watch?v=abEf3wQJBmE http://www.52nlp.cn/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E4%B8%BB%E6%9C%BA%E7%8E%AF%E5%A2%83%E9%85%8D%E7%BD%AE-ubuntu16-04-geforce-gtx1080-tensorflow 运行 bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
报错 Target //tensorflow/tools/pip_package:build_pip_package failed to build Use --verbose_failures to see the command lines of failed build steps.
cuda
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn6_6.0.21-1%2Bcuda8.0_amd64.deb wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn6-dev_6.0.21-1%2Bcuda8.0_amd64.deb sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb sudo dpkg -i libcudnn6_6.0.21-1+cuda8.0_amd64.deb sudo dpkg -i libcudnn6-dev_6.0.21-1+cuda8.0_amd64.deb sudo apt-get update sudo apt-get install cuda=8.0.61-1 sudo apt-get install libcudnn6-dev
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
编译安装tensorflow GPU版本时报错:Cannot find libdevice.10.bc under /usr/local/cuda-8.0
解决办法为:
将/usr/local/cuda-8.0/nvvm/libdevice/libdevice.compute_50.10.bc改为libdevice.10.bc,并复制一份至/usr/local/cuda-8.0/
. ~/.bashrc
nano .bashrc
(fastai) suncan@deeplearning-6:~$ sudo ldconfig /sbin/ldconfig.real: /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7 is not a symbolic link
https://blog.csdn.net/langb2014/article/details/54376716
sudo ln -sf /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7.0.5 /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7
sudo ln -sf /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7.0.5 /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7
suncan@deeplearning-7:~$ sudo apt-get install python2.7-dev python3.5-dev python3.6-dev pylint
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package python3.6-dev
E: Couldn't find any package by glob 'python3.6-dev'
E: Couldn't find any package by regex 'python3.6-dev'
方法
sudo add-apt-repository ppa:jonathonf/python-3.6
sudo apt-get update
sudo apt-get install python3.6 libpython3.6
sudo apt-get update
sudo apt-get upgrade
https://www.howtoing.com/how-to-install-the-anaconda-python-distribution-on-ubuntu-16-04
curl -O https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
我们现在可以通过SHA-256校验和通过加密散列验证来验证安装程序的数据完整性。我们将使用sha256sum命令以及脚本的文件名:
sha256sum Anaconda3-5.1.0-Linux-x86_64.sh
bash Anaconda3-5.1.0-Linux-x86_64.sh
一路yes
Prepending PATH=/home/sammy/anaconda3/bin to PATH in /home/sammy/.bashrc A backup will be made to: /home/sammy/.bashrc-anaconda3.bak
source ~/.bashrc
一旦你这样做,你可以验证你的安装通过使用conda命令,例如与list :
conda list
lspci | grep -i nvidia
uname -m && cat /etc/*release
sudo apt-get install build-essential
sudo apt-get install cmake git unzip zip
sudo apt-get install pylint
uname -r
sudo apt-get install linux-headers-$(uname -r)
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-9.1
sudo reboot
nano ~/.bashrc
in the end of the file, add:
export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
ctrl+x then y to save and exit
source ~/.bashrc
sudo ldconfig
nvidia-smi
Goto https://developer.nvidia.com/cudnn and download Membership required
After login
Download the following:
cuDNN v7.1.3 Runtime Library for Ubuntu16.04 (Deb)
cuDNN v7.1.3 Developer Library for Ubuntu16.04 (Deb)
cuDNN v7.1.3 Code Samples and User Guide for Ubuntu16.04 (Deb)
Goto downloaded folder and in terminal perform following:
sudo dpkg -i libcudnn7-doc_7.1.3.16-1+cuda9.1_amd64.deb
sudo dpkg -i llibcudnn7_7.1.3.16-1+cuda9.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.1.3.16-1+cuda9.1_amd64.deb
Verifying cuDNN installation:
cp -r /usr/src/cudnn_samples_v7/ $HOME
cd $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN
If cuDNN is properly installed and running on your Linux system, you will see a message similar to the following:
Test passed!
libcupti (required)
sudo apt-get install libcupti-dev
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
Bazel (required)
sudo apt-get install pkg-config zip g++ zlib1g-dev unzip
sudo apt-get install openjdk-8-jdk
wget https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel_0.11.1-linux-x86_64.deb
sudo dpkg -i bazel_0.11.1-linux-x86_64.deb
To install these packages for Python 3.n, issue the following command:
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
source ~/.bashrc
sudo ldconfig
wget https://github.com/tensorflow/tensorflow/archive/v1.7.0.zip
unzip v1.7.0.zip
cd tensorflow-1.7.0
./configure
Give python path in
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Press enter two times
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: Y
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: Y
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]: N
Do you wish to build TensorFlow with XLA JIT support? [y/N]: N
Do you wish to build TensorFlow with GDR support? [y/N]: N
Do you wish to build TensorFlow with VERBS support? [y/N]: N
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1.3
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/x86_64-linux-gnu
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Now we need compute capability which we have noted at step 1 eg. 5.0
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.0] 5.0
Do you want to use clang as CUDA compiler? [y/N]: N
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc
Do you wish to build TensorFlow with MPI support? [y/N]: N
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -march=native
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N
Do following to create symbolic link to cuda/include/math_functions.hpp from cuda/include/crt/math_functions.hpp to fix math_functions.hpp is not found error.
sudo ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp
bazel build --config=opt --config=cuda --incompatible_load_argument_is_label=false //tensorflow/tools/pip_package:build_pip_package
This process will take a lot of time. It may take 1 – 2 hours or maybe even more.
The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the tensorflow_pkg directory:
To build whl file issue following command:
bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg
Activate your virtual environment here if you use.
To install tensorflow with pip:
cd tensorflow_pkg
#python3是默认
pip install tensorflow*.whl
python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
结果
Hello, TensorFlow!
启用jupyter notebook
jupyter notebook --ip 0.0.0.0 --port 8888
SSD backed PD Capacity 指的是,VM关机之后,用SSD保存时候消费。 备份快照,再删除VM可以减少此部分消费
tf-nightly-gpu
test