wuhuhu800 / test-one-on-desktop

this is my first desktop practice
0 stars 0 forks source link

test for picture #1

Open wuhuhu800 opened 6 years ago

wuhuhu800 commented 6 years ago

test

wuhuhu800 commented 6 years ago

xgb_fit_tree xgb_train_tree

wuhuhu800 commented 6 years ago
  1. 防火墙 image

image

image

wuhuhu800 commented 6 years ago

GPU 配额申请 image

image

wuhuhu800 commented 6 years ago

实例 建议n1-highmem-4 image

image

image

image

https://cloud.google.com/compute/docs/gpus/add-gpus

image

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-8-0; then
  # The 16.04 installer works with 16.10.
  curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  apt-get update
  apt-get install cuda-8-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1

image

wuhuhu800 commented 6 years ago

进入实例 image

image 命令行(前提安装gcloud命令) image

wuhuhu800 commented 6 years ago

https://www.imooc.com/article/22947?block_id=tuijian_wz https://medium.com/@jamsawamsa/running-a-google-cloud-gpu-for-fast-ai-for-free-5f89c707bae6

配置实例实验环境:安装CUDA+深度学习库+配置Jupyter Notebook

wget https://raw.githubusercontent.com/fastai/courses/master/setup/install-gpu.sh sudo sh install-gpu.sh sudo reboot

之后再连接上服务器

检查CUDA是否安装成功

sudo modprobe nvidia nvidia-smi

检查jupyter

jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root

image

发现安装的是python2

制作镜像之后,再次升级python3 image

wuhuhu800 commented 6 years ago

https://www.howtoing.com/how-to-install-the-anaconda-python-distribution-on-ubuntu-16-04

cd /tmp curl -O https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh

我们现在可以通过SHA-256校验和通过加密散列验证来验证安装程序的数据完整性。我们将使用sha256sum命令以及脚本的文件名: sha256sum Anaconda3-5.1.0-Linux-x86_64.sh bash Anaconda3-5.1.0-Linux-x86_64.sh 一路yes

Prepending PATH=/home/sammy/anaconda3/bin to PATH in /home/sammy/.bashrc A backup will be made to: /home/sammy/.bashrc-anaconda3.bak ...

source ~/.bashrc

一旦你这样做,你可以验证你的安装通过使用conda命令,例如与list :

conda list

wuhuhu800 commented 6 years ago

conda search "^python$" image

切换root角色

sudo su

chown suncan anaconda2 chown suncan .conda

切换回suncan角色

su suncan

conda search "^python$" image

升级成python3.6

conda install python=3.6 报错

UnsatisfiableError: The following specifications were found to be in conflict:
  - python=3.6
  - ssl_match_hostname -> python[version='>=2.7,<2.8.0a0']
Use "conda info <package>" to see the dependencies for each package.

conda update anaconda 还是不行

顺便建立一个虚拟环境 conda create --name py36 python=3.6 用conda info --envs image

发现是安装到anaconda2上了 #

To activate this environment, use:

> source activate py36

#

To deactivate an active environment, use:

> source deactivate

最终解决方案: vim ~/.bashrc image

export PATH="/home/suncan/anaconda2/bin:$PATH" 改成 export PATH="/home/suncan/anaconda3/bin:$PATH"

然后再重新加载 source ~/.bashrc 结果 image

wuhuhu800 commented 6 years ago

此时运行 jupyter notebook --ip 0.0.0.0 --port 8888

Traceback (most recent call last):
  File "/home/suncan/anaconda3/bin/jupyter-notebook", line 11, in <module>
    sys.exit(main())
  File "/home/suncan/anaconda3/lib/python3.6/site-packages/jupyter_core/application.py", line 266, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/home/suncan/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 657, in launch_instance
    app.initialize(argv)
  File "<decorator-gen-7>", line 2, in initialize
  File "/home/suncan/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/home/suncan/anaconda3/lib/python3.6/site-packages/notebook/notebookapp.py", line 1501, in initialize
    super(NotebookApp, self).initialize(argv)
  File "<decorator-gen-6>", line 2, in initialize
  File "/home/suncan/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/home/suncan/anaconda3/lib/python3.6/site-packages/jupyter_core/application.py", line 242, in initialize
    self.migrate_config()
  File "/home/suncan/anaconda3/lib/python3.6/site-packages/jupyter_core/application.py", line 168, in migrate_config
    migrate()
  File "/home/suncan/anaconda3/lib/python3.6/site-packages/jupyter_core/migrate.py", line 247, in migrate
    with open(os.path.join(env['jupyter_config'], 'migrated'), 'w') as f:
PermissionError: [Errno 13] Permission denied: '/home/suncan/.jupyter/migrated'

老问题,权限不足 sudo su chown suncan .jupyter su suncan

wuhuhu800 commented 6 years ago

git clone https://github.com/tensorflow/tensorflow

cd ~/tensorflow

git checkout r1.5

./configure

You have bazel 0.12.0 installed.
Please specify the location of python. [Default is /home/suncan/anaconda3/bin/python]:

Found possible Python library paths:
  /home/suncan/anaconda3/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/home/suncan/anaconda3/lib/python3.6/site-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: y
Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: nb
Invalid selection: nb
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1

Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 5

Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.7]

Do you want to use clang as CUDA compiler? [y/N]: y
Clang will be used as CUDA compiler.

Please specify which clang should be used as device and host compiler. [Default is ]:

Invalid clang path:  cannot be found.
Please specify which clang should be used as device and host compiler. [Default is ]:

Invalid clang path:  cannot be found.
Please specify which clang should be used as device and host compiler. [Default is ]: ^CTraceback (most recent call last):
  File "configure.py", line 81, in get_input
    answer = raw_input(question)
NameError: name 'raw_input' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "configure.py", line 1367, in <module>
    main()
  File "configure.py", line 1329, in main
    set_clang_cuda_compiler_path(environ_cp)
  File "configure.py", line 555, in set_clang_cuda_compiler_path
    default_clang_path)
  File "configure.py", line 539, in get_from_env_or_user_or_default
    var = get_input(ask_for_var)
  File "configure.py", line 83, in get_input
    answer = input(question)  # pylint: disable=bad-builtin
KeyboardInterrupt
suncan@deeplearning-woody3:~/tensorflow$  ./configure
You have bazel 0.12.0 installed.
Please specify the location of python. [Default is /home/suncan/anaconda3/bin/python]:

Found possible Python library paths:
  /home/suncan/anaconda3/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/home/suncan/anaconda3/lib/python3.6/site-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: y
Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1

Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 5

Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.7]

Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Configuration finished

sudo apt-get install libcurl3 libcurl3-dev

https://blog.csdn.net/tintinetmilou/article/details/78756304 https://www.youtube.com/watch?v=abEf3wQJBmE http://www.52nlp.cn/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E4%B8%BB%E6%9C%BA%E7%8E%AF%E5%A2%83%E9%85%8D%E7%BD%AE-ubuntu16-04-geforce-gtx1080-tensorflow 运行 bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

报错 Target //tensorflow/tools/pip_package:build_pip_package failed to build Use --verbose_failures to see the command lines of failed build steps.

wuhuhu800 commented 6 years ago

cuda

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn6_6.0.21-1%2Bcuda8.0_amd64.deb wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn6-dev_6.0.21-1%2Bcuda8.0_amd64.deb sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb sudo dpkg -i libcudnn6_6.0.21-1+cuda8.0_amd64.deb sudo dpkg -i libcudnn6-dev_6.0.21-1+cuda8.0_amd64.deb sudo apt-get update sudo apt-get install cuda=8.0.61-1 sudo apt-get install libcudnn6-dev

export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

wuhuhu800 commented 6 years ago

编译安装tensorflow GPU版本时报错:Cannot find libdevice.10.bc under /usr/local/cuda-8.0

解决办法为:

将/usr/local/cuda-8.0/nvvm/libdevice/libdevice.compute_50.10.bc改为libdevice.10.bc,并复制一份至/usr/local/cuda-8.0/

wuhuhu800 commented 6 years ago

fast ai v2

https://medium.com/@howkhang/ultimate-guide-to-setting-up-a-google-cloud-machine-for-fast-ai-version-2-f374208be43

curl https://raw.githubusercontent.com/howkhang/fastai-v2-setup/master/setup.sh | bash

wuhuhu800 commented 6 years ago

. ~/.bashrc

nano .bashrc

wuhuhu800 commented 6 years ago

(fastai) suncan@deeplearning-6:~$ sudo ldconfig /sbin/ldconfig.real: /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7 is not a symbolic link

https://blog.csdn.net/langb2014/article/details/54376716

sudo ln -sf /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7.0.5 /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7

sudo ln -sf /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7.0.5 /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7

wuhuhu800 commented 6 years ago
suncan@deeplearning-7:~$ sudo apt-get install python2.7-dev python3.5-dev python3.6-dev pylint
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package python3.6-dev
E: Couldn't find any package by glob 'python3.6-dev'
E: Couldn't find any package by regex 'python3.6-dev'

方法

sudo add-apt-repository ppa:jonathonf/python-3.6
sudo apt-get update
sudo apt-get install python3.6 libpython3.6
wuhuhu800 commented 6 years ago

参考文章How to install Tensorflow GPU with CUDA Toolkit 9.1 and cuDNN 7.1.2 for Python 3 on Ubuntu 16.04-64bit

最终版本安装ubuntu 16.04LTS + Anaconda3 +cuda9.1+cudnn7.1.3

1、配置ubuntu 16.04

2、升级

sudo apt-get update 
sudo apt-get upgrade

3、安装 Anaconda3

https://www.howtoing.com/how-to-install-the-anaconda-python-distribution-on-ubuntu-16-04
curl -O https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh

我们现在可以通过SHA-256校验和通过加密散列验证来验证安装程序的数据完整性。我们将使用sha256sum命令以及脚本的文件名:

sha256sum Anaconda3-5.1.0-Linux-x86_64.sh
bash Anaconda3-5.1.0-Linux-x86_64.sh

一路yes

Prepending PATH=/home/sammy/anaconda3/bin to PATH in /home/sammy/.bashrc A backup will be made to: /home/sammy/.bashrc-anaconda3.bak

source ~/.bashrc

一旦你这样做,你可以验证你的安装通过使用conda命令,例如与list :

conda list

4、 Verify You Have a CUDA-Capable GPU:

lspci | grep -i nvidia

5、Verify You Have a Supported Version of Linux

uname -m && cat /etc/*release

6、Install Dependencies

sudo apt-get install build-essential 
sudo apt-get install cmake git unzip zip 
sudo apt-get install  pylint

7、Install linux kernel header

uname -r
sudo apt-get install linux-headers-$(uname -r)

8、Download the NVIDIA CUDA Toolkit:

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub

sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-9.1

9、Reboot the system to load the NVIDIA drivers

sudo reboot

10、Go to terminal and type:

nano ~/.bashrc

in the end of the file, add:

export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

ctrl+x then y to save and exit

source ~/.bashrc
sudo ldconfig
nvidia-smi

11、Install cuDNN 7.1.3

Goto https://developer.nvidia.com/cudnn and download Membership required

After login

Download the following:

cuDNN v7.1.3 Runtime Library for Ubuntu16.04 (Deb)

cuDNN v7.1.3 Developer Library for Ubuntu16.04 (Deb)

cuDNN v7.1.3 Code Samples and User Guide for Ubuntu16.04 (Deb)

Goto downloaded folder and in terminal perform following:

sudo dpkg -i libcudnn7-doc_7.1.3.16-1+cuda9.1_amd64.deb
sudo dpkg -i llibcudnn7_7.1.3.16-1+cuda9.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.1.3.16-1+cuda9.1_amd64.deb 

Verifying cuDNN installation:

cp -r /usr/src/cudnn_samples_v7/ $HOME
cd  $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN

If cuDNN is properly installed and running on your Linux system, you will see a message similar to the following:

Test passed!

12、Install Dependencies

libcupti (required)

sudo apt-get install libcupti-dev

echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

Bazel (required)

sudo apt-get install pkg-config zip g++ zlib1g-dev unzip

sudo apt-get install openjdk-8-jdk
wget https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel_0.11.1-linux-x86_64.deb
sudo dpkg -i bazel_0.11.1-linux-x86_64.deb

To install these packages for Python 3.n, issue the following command:

sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel

13、 Configure Tensorflow from source:

source ~/.bashrc
sudo ldconfig
wget https://github.com/tensorflow/tensorflow/archive/v1.7.0.zip
unzip v1.7.0.zip
cd tensorflow-1.7.0
./configure

Give python path in

Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3

Press enter two times

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: Y
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: Y
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]: N
Do you wish to build TensorFlow with XLA JIT support? [y/N]: N
Do you wish to build TensorFlow with GDR support? [y/N]: N
Do you wish to build TensorFlow with VERBS support? [y/N]: N
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1.3
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/x86_64-linux-gnu
Do you wish to build TensorFlow with TensorRT support? [y/N]: N

Now we need compute capability which we have noted at step 1 eg. 5.0

Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.0] 5.0
Do you want to use clang as CUDA compiler? [y/N]: N
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc
Do you wish to build TensorFlow with MPI support? [y/N]: N
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -march=native
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N

14、Build Tensorflow using bazel

Do following to create symbolic link to cuda/include/math_functions.hpp from cuda/include/crt/math_functions.hpp to fix math_functions.hpp is not found error.

sudo ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp
bazel build --config=opt --config=cuda --incompatible_load_argument_is_label=false //tensorflow/tools/pip_package:build_pip_package

This process will take a lot of time. It may take 1 – 2 hours or maybe even more.

The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the tensorflow_pkg directory:

To build whl file issue following command:

bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg

Activate your virtual environment here if you use.

To install tensorflow with pip:

cd tensorflow_pkg
#python3是默认
pip install tensorflow*.whl

15、Verify Tensorflow installation

python

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

结果

Hello, TensorFlow!

启用jupyter notebook

jupyter notebook --ip 0.0.0.0 --port 8888
wuhuhu800 commented 6 years ago

image

wuhuhu800 commented 6 years ago

SSD backed PD Capacity 指的是,VM关机之后,用SSD保存时候消费。 备份快照,再删除VM可以减少此部分消费

wuhuhu800 commented 6 years ago

tf-nightly-gpu