mit-quest / necstlab-damage-segmentation

MIT License
5 stars 6 forks source link

upgrade software versions #75

Closed rak5216 closed 3 years ago

rak5216 commented 3 years ago

ubuntu (and python) 20.x, pip >=19, tensorflow 2.3

CarolinaFurtado commented 3 years ago

The workflow runs fine with the TensorFlow 2.3. However, there are some issues with the metrics - they seem to be in a different order from one version to the other.

CarolinaFurtado commented 3 years ago

I am trying to compare the results from tensorflow 2.1 and 2.3. I created tf-debug config files for that purpose. However, we need to need to make sure the training set and starting points for training are the same.

Using encoder_weights: 'imagenet' raises an error regarding the input_shape:

Traceback (most recent call last): 
File "train_segmentation_model.py", line 212, in <module> train(**argparser.parse_args().__dict__) 
File "train_segmentation_model.py", line 94, in train train_config['optimizer']) 
File "/home/cfurtado/necstlab-damage-segmentation/models.py", line 113, in generate_compiled_segmentation_model model = Unet(input_shape=(None, None, 1), classes=num_classes, **model_parameters) 
File "/home/cfurtado/.local/lib/python3.5/site-packages/segmentation_models/__init__.py", line 34, in wrapper return func(*args, **kwargs) 
File "/home/cfurtado/.local/lib/python3.5/site-packages/segmentation_models/models/unet.py", line 226, in Unet **kwargs, 
File "/home/cfurtado/.local/lib/python3.5/site-packages/segmentation_models/backbones/backbones_factory.py", line 103, in get_backbone model = model_fn(*args, **kwargs) 
File "/home/cfurtado/.local/lib/python3.5/site-packages/classification_models/models_factory.py", line 78, in wrapper return func(*args, **new_kwargs) 
File "/home/cfurtado/.local/lib/python3.5/site-packages/keras_applications/vgg16.py", line 99, in VGG16 weights=weights) 
File "/home/cfurtado/.local/lib/python3.5/site-packages/keras_applications/imagenet_utils.py", line 316, in _obtain_input_shape '`input_shape=' + str(input_shape) + '`')ValueError: The input must have 3 channels; got `input_shape=(None, None, 1)`

Josh's thoughts on that: "I believe channels here corresponds to image colors. it looks like the imagenet pretrained backbone is expecting 3 channels (colors) and we’re just using 1 (just gray rather than red/green/blue). So I’d start by looking into if that pretrained backbone actually can support 1 channel or not. And if not does that mean in it working in the past required us to use 3 channels?"

Reed - the default instantiation is input_shape=(None, None, 3), so we immediately violate that by setting None, None, 1. and imagenet is indeed full color, so i think we will just ignore since our input makes apple to oranges

So we will try a different approach to get models to start with the same initial weights.

CarolinaFurtado commented 3 years ago

We are now trying to work around it by allowing the training to be trained starting with the weights of a pretrained model (so, with fixed weights that are the outcome of training a model): Issue: Enable pre-training by initializing new model with previously trained weights #32

CarolinaFurtado commented 3 years ago

Running on cpu (to get repeatable results):

tf2.1 loss: 0.002412608 0.001155156 0.000974846 0.000862088

_valloss: 0.02849 0.002013 0.106169 0.002202

tf 2.3 loss: 0.002411151 0.001134973 0.000977466 0.000861835

_valloss: 0.022246 0.002309 0.005034 0.002559

image image

Loss seems almost equal, not the val_loss though. Any thoughts @Josh-Joseph @rak5216 ?

rak5216 commented 3 years ago

@CarolinaFurtado can u check binary_ce_metric too? it should match loss in train and val sets. also need to verify that tf 2.3 is repeatable on cpu. there's a chance that val loss is just more volatile than train loss, but ultimately, bottomline is tf 2.1 is not repeated by tf 2.3

CarolinaFurtado commented 3 years ago

@rak5216, for 2.3: image image

for 2.1 image image

tf2.1 and 2.3 should not match each other here: different pretrained models

CarolinaFurtado commented 3 years ago

confirmed that 2.1 and 2.3 give slightly different results, even when ran on cpu

image

CarolinaFurtado commented 3 years ago

TRAIN

Fixed the metrics mosaic by removing lossfrom metric_names. Results don't match exactly because we don't have repeatability between 2.1 and 2.3. 2.3 image 2.1 image

TRAIN THRESHOLDS

dict_results = dict(zip(metric_names, all_results)) in models.py was compiling two lists with different dimensions. meaning the selected optimizing_result was wrong in the end. Removed loss from metric_names

tf2.3 without the modification

wrong! loss + binary_crossentropy + binary_ce_metric (off by one) {'loss': 0.0010814343113452196, 'binary_ce_metric': 3.68487532154127e-11, WRONG!!!! should all be the same, but since we are dict(zip(10 names, 9 metrics)), it ignores the last one, and the 3 first are binary cross entropy 'class0_f1_score': 0.29115191102027893, 'class0_binary_accuracy_sm': 0.9995417594909668, 'binary_crossentropy': 0.0010814343113452196, 'class0_precision': 0.33629000186920166, 'class0_binary_cross_entropy': 0.9995417594909668, 'class0_iou_score': 0.31209734082221985, - this is the optimal value. wrong here 'class0_binary_accuracy_tfkeras': 0.18490245938301086}

tf2.3 with the modification

right! loss + binary_ce_metric {'class0_f1_score': 0.31209734082221985, 'class0_precision': 0.29115191102027893, 'class0_recall': 0.33629000186920166, 'loss': 0.0010814343113452196, 'class0_binary_cross_entropy': 3.68487532154127e-11, 'class0_binary_accuracy_tfkeras': 0.9995417594909668, 'binary_ce_metric': 0.0010814343113452196, 'class0_binary_accuracy_sm': 0.9995417594909668, 'class0_iou_score': 0.18490245938301086} - this is the optimal value. ok here

TEST

Same issue: removed loss from metric_names. Results match

image

CarolinaFurtado commented 3 years ago

warning when creating the VM:

DEPRECATION: Python 3.5 reached the end of its life on September 13th, 2020. Please upgrade your Python as Python 3.5 is no longer maintained. pip 21.0 will drop support for Python 3.5 in January 2021. pip 21.0 will remove support for this functionality.

CarolinaFurtado commented 3 years ago

updating ubuntu version, cuda and cudnn

current version: ubuntu 16.04 + cuda 10.1 + cudnn 7.65 ---- python 3.5 which is discontinued.

version trial 1: ubuntu 18.04 + cuda 10.1 + cudnn 7.65 ---- python 3.6. It works, but tf is not connecting to gpu

version trial 2: ubuntu 18.04 + cuda 11.0 + cudnn 8.0 ---- python 3.6. It works, but tf is not connecting to gpu

Other people have had simmilar problems when updating to ubunto 18.04: https://github.com/tensorflow/tensorflow/issues/43236

CarolinaFurtado commented 3 years ago

from https://developer.nvidia.com/cuda-10.1-download-archive-update2?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=deblocal

ubunto version -1804

boot_disk { initialize_params { image = "projects/ubuntu-os-cloud/global/images/ubuntu-1804-bionic-v20201014" size = "${var.hard_drive_size_gp}" type = "pd-ssd" } }

cuda and cudnn versions

sudo apt-get update

sudo apt-get install -y build-essential

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update

sudo apt-get -y install nvidia-driver-418
sudo apt-get -y install cuda-10.1
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}

# install cudnn
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb

wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb

creating the VM

google_compute_instance.vm[0]: Creating...
google_compute_instance.vm[0]: Still creating... [10s elapsed]
google_compute_instance.vm[0]: Still creating... [20s elapsed]
google_compute_instance.vm[0]: Still creating... [30s elapsed]
google_compute_instance.vm[0]: Provisioning with 'file'...
google_compute_instance.vm[0]: Still creating... [40s elapsed]
google_compute_instance.vm[0]: Still creating... [50s elapsed]
google_compute_instance.vm[0]: Still creating... [1m0s elapsed]
google_compute_instance.vm[0]: Still creating... [1m10s elapsed]
google_compute_instance.vm[0]: Provisioning with 'remote-exec'...
google_compute_instance.vm[0] (remote-exec): Connecting to remote host via SSH...
google_compute_instance.vm[0] (remote-exec):   Host: 34.74.190.126
google_compute_instance.vm[0] (remote-exec):   User: cfurtado
google_compute_instance.vm[0] (remote-exec):   Password: false
google_compute_instance.vm[0] (remote-exec):   Private key: true
google_compute_instance.vm[0] (remote-exec):   Certificate: false
google_compute_instance.vm[0] (remote-exec):   SSH Agent: false
google_compute_instance.vm[0] (remote-exec):   Checking Host Key: false
google_compute_instance.vm[0] (remote-exec): Connected!
google_compute_instance.vm[0] (remote-exec): Running resource creation script... (this may take 10+ minutes)
google_compute_instance.vm[0] (remote-exec): W: GPG error: http://archive.ubuntu.com/ubuntu bionic InRelease: Splitting up /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_bionic_InRelease into data and signature failed
google_compute_instance.vm[0] (remote-exec): E: The repository 'http://archive.ubuntu.com/ubuntu bionic InRelease' is not signed.
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 75%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[0]: Still creating... [1m20s elapsed]
google_compute_instance.vm[0]: Still creating... [1m30s elapsed]
google_compute_instance.vm[0] (remote-exec): --2020-11-02 15:04:59--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
google_compute_instance.vm[0] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[0] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
google_compute_instance.vm[0] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[0] (remote-exec): Length: 190 [application/octet-stream]
google_compute_instance.vm[0] (remote-exec): Saving to: ‘cuda-ubuntu1804.pin’

google_compute_instance.vm[0] (remote-exec):       cuda-   0%       0  --.-KB/s
google_compute_instance.vm[0] (remote-exec): cuda-ubuntu 100%     190  --.-KB/s    in 0s

google_compute_instance.vm[0] (remote-exec): 2020-11-02 15:04:59 (6.74 MB/s) - ‘cuda-ubuntu1804.pin’ saved [190/190]

google_compute_instance.vm[0] (remote-exec): --2020-11-02 15:04:59--  http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
google_compute_instance.vm[0] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[0] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[0] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[0] (remote-exec): Length: 1859785444 (1.7G) [application/x-deb]
google_compute_instance.vm[0] (remote-exec): Saving to: ‘cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb’

google_compute_instance.vm[0] (remote-exec):       cuda-   0%       0  --.-KB/s
google_compute_instance.vm[0] (remote-exec):      cuda-r   2%  37.40M   187MB/s
google_compute_instance.vm[0] (remote-exec):     cuda-re   5%  94.23M   235MB/s
google_compute_instance.vm[0] (remote-exec):    cuda-rep   8% 149.99M   250MB/s
google_compute_instance.vm[0] (remote-exec):   cuda-repo  11% 205.94M   257MB/s
google_compute_instance.vm[0] (remote-exec):  cuda-repo-  14% 259.92M   260MB/s
google_compute_instance.vm[0] (remote-exec): cuda-repo-u  17% 316.89M   264MB/s
google_compute_instance.vm[0] (remote-exec): uda-repo-ub  21% 373.72M   267MB/s
google_compute_instance.vm[0] (remote-exec): da-repo-ubu  24% 429.91M   269MB/s
google_compute_instance.vm[0] (remote-exec): a-repo-ubun  27% 484.08M   269MB/s
google_compute_instance.vm[0] (remote-exec): -repo-ubunt  30% 540.28M   270MB/s
google_compute_instance.vm[0] (remote-exec): repo-ubuntu  33% 596.67M   271MB/s
google_compute_instance.vm[0]: Still creating... [1m40s elapsed]
google_compute_instance.vm[0] (remote-exec): epo-ubuntu1  36% 651.08M   271MB/s
google_compute_instance.vm[0] (remote-exec): po-ubuntu18  39% 703.88M   271MB/s
google_compute_instance.vm[0] (remote-exec): o-ubuntu180  42% 759.34M   271MB/s
google_compute_instance.vm[0] (remote-exec): -ubuntu1804  45% 815.61M   272MB/s    eta 4s
google_compute_instance.vm[0] (remote-exec): ubuntu1804-  49% 871.46M   278MB/s    eta 4s
google_compute_instance.vm[0] (remote-exec): buntu1804-1  52% 927.98M   278MB/s    eta 4s
google_compute_instance.vm[0] (remote-exec): untu1804-10  55% 984.70M   278MB/s    eta 4s
google_compute_instance.vm[0] (remote-exec): ntu1804-10-  58%   1.02G   279MB/s    eta 4s
google_compute_instance.vm[0] (remote-exec): tu1804-10-1  61%   1.07G   279MB/s    eta 2s
google_compute_instance.vm[0] (remote-exec): u1804-10-1-  65%   1.13G   279MB/s    eta 2s
google_compute_instance.vm[0] (remote-exec): 1804-10-1-l  68%   1.18G   279MB/s    eta 2s
google_compute_instance.vm[0] (remote-exec): 804-10-1-lo  71%   1.24G   279MB/s    eta 2s
google_compute_instance.vm[0] (remote-exec): 04-10-1-loc  74%   1.29G   279MB/s    eta 2s
google_compute_instance.vm[0]: Still creating... [1m50s elapsed]
google_compute_instance.vm[0] (remote-exec): 4-10-1-loca  77%   1.35G   280MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): -10-1-local  81%   1.40G   280MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): 10-1-local-  84%   1.46G   280MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): 0-1-local-1  87%   1.51G   281MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): -1-local-10  90%   1.57G   282MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): 1-local-10.  93%   1.62G   282MB/s    eta 0s
google_compute_instance.vm[0] (remote-exec): -local-10.1  96%   1.68G   282MB/s    eta 0s
google_compute_instance.vm[0] (remote-exec): cuda-repo-u 100%   1.73G   282MB/s    in 6.4s

google_compute_instance.vm[0] (remote-exec): 2020-11-02 15:05:06 (277 MB/s) - ‘cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb’ saved [1859785444/1859785444]

google_compute_instance.vm[0] (remote-exec): Warning: apt-key output should not be parsed (stdout is not a terminal)
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 20%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 41%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 62%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 83%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[0]: Still creating... [3m6s elapsed]
google_compute_instance.vm[0]: Still creating... [3m16s elapsed]
google_compute_instance.vm[0]: Still creating... [3m26s elapsed]
google_compute_instance.vm[0]: Still creating... [3m36s elapsed]
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 25%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 50%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 75%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[0]: Still creating... [3m46s elapsed]
google_compute_instance.vm[0]: Still creating... [3m56s elapsed]
google_compute_instance.vm[0]: Still creating... [4m6s elapsed]
google_compute_instance.vm[0]: Still creating... [4m16s elapsed]
google_compute_instance.vm[0]: Still creating... [4m26s elapsed]
google_compute_instance.vm[0]: Still creating... [4m36s elapsed]
google_compute_instance.vm[0]: Still creating... [4m46s elapsed]
google_compute_instance.vm[0]: Still creating... [4m56s elapsed]
google_compute_instance.vm[0]: Still creating... [5m6s elapsed]
google_compute_instance.vm[0]: Still creating... [5m16s elapsed]
google_compute_instance.vm[0]: Still creating... [5m26s elapsed]
google_compute_instance.vm[0]: Still creating... [5m36s elapsed]
google_compute_instance.vm[0]: Still creating... [5m46s elapsed]
google_compute_instance.vm[0]: Still creating... [5m56s elapsed]
google_compute_instance.vm[0]: Still creating... [6m6s elapsed]
google_compute_instance.vm[0] (remote-exec): --2020-11-02 15:09:32--  http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
google_compute_instance.vm[0] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[0] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[0] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[0] (remote-exec): Length: 182313188 (174M) [application/x-deb]
google_compute_instance.vm[0] (remote-exec): Saving to: ‘libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb’

google_compute_instance.vm[0] (remote-exec):       libcu   0%       0  --.-KB/s
google_compute_instance.vm[0] (remote-exec):      libcud  21%  37.99M   190MB/s
google_compute_instance.vm[0] (remote-exec):     libcudn  54%  94.82M   237MB/s
google_compute_instance.vm[0] (remote-exec):    libcudnn  87% 151.90M   253MB/s
google_compute_instance.vm[0] (remote-exec): libcudnn7_7 100% 173.87M   257MB/s    in 0.7s

google_compute_instance.vm[0] (remote-exec): 2020-11-02 15:09:33 (257 MB/s) - ‘libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb’ saved [182313188/182313188]

google_compute_instance.vm[0]: Still creating... [6m16s elapsed]
google_compute_instance.vm[0]: Still creating... [6m26s elapsed]
google_compute_instance.vm[0] (remote-exec): --2020-11-02 15:09:51--  http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
google_compute_instance.vm[0] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[0] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[0] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[0] (remote-exec): Length: 160506208 (153M) [application/x-deb]
google_compute_instance.vm[0] (remote-exec): Saving to: ‘libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb’

google_compute_instance.vm[0] (remote-exec):       libcu   0%       0  --.-KB/s
google_compute_instance.vm[0] (remote-exec):      libcud  23%  35.31M   177MB/s
google_compute_instance.vm[0] (remote-exec):     libcudn  60%  92.68M   232MB/s
google_compute_instance.vm[0] (remote-exec):    libcudnn  97% 149.30M   249MB/s
google_compute_instance.vm[0] (remote-exec): libcudnn7-d 100% 153.07M   250MB/s    in 0.6s

google_compute_instance.vm[0] (remote-exec): 2020-11-02 15:09:52 (250 MB/s) - ‘libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb’ saved [160506208/160506208]

google_compute_instance.vm[0]: Still creating... [6m36s elapsed]
google_compute_instance.vm[0]: Still creating... [6m46s elapsed]
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 13%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 26%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 40%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 53%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 67%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 80%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 94%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[0]: Still creating... [7m8s elapsed]
google_compute_instance.vm[0]: Still creating... [7m18s elapsed]
google_compute_instance.vm[0]: Still creating... [7m28s elapsed]
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0]: Still creating... [7m38s elapsed]
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts easy_install and easy_install-3.6 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0] (remote-exec): WARNING: Skipping crcmod as it is not installed.
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0]: Still creating... [7m48s elapsed]
google_compute_instance.vm[0]: Still creating... [7m58s elapsed]
google_compute_instance.vm[0]: Still creating... [8m8s elapsed]
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts f2py, f2py3 and f2py3.6 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script markdown_py is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts pyrsa-decrypt, pyrsa-encrypt, pyrsa-keygen, pyrsa-priv2pub, pyrsa-sign and pyrsa-verify are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script google-oauthlib-tool is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script tensorboard is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts estimator_ckpt_converter, saved_model_cli, tensorboard, tf_upgrade_v2, tflite_convert, toco and toco_from_protos are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts imageio_download_bin and imageio_remove_bin are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts lsm2bin and tifffile are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script skivi is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script pygmentize is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts iptest, iptest3, ipython and ipython3 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0]: Still creating... [8m18s elapsed]
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts jupyter, jupyter-migrate and jupyter-troubleshoot are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts jupyter-kernel, jupyter-kernelspec and jupyter-run are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script jupyter-console is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script jupyter-trust is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script jupyter-nbconvert is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts jupyter-bundlerextension, jupyter-nbextension, jupyter-notebook and jupyter-serverextension are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec): ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

google_compute_instance.vm[0] (remote-exec): We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

google_compute_instance.vm[0] (remote-exec): tensorflow-gpu 2.3.1 requires numpy<1.19.0,>=1.16.0, but you'll have numpy 1.19.3 which is incompatible.
google_compute_instance.vm[0]: Provisioning with 'remote-exec'...
google_compute_instance.vm[0] (remote-exec): Connecting to remote host via SSH...
google_compute_instance.vm[0] (remote-exec):   Host: 34.74.190.126
google_compute_instance.vm[0] (remote-exec):   User: cfurtado
google_compute_instance.vm[0] (remote-exec):   Password: false
google_compute_instance.vm[0] (remote-exec):   Private key: true
google_compute_instance.vm[0] (remote-exec):   Certificate: false
google_compute_instance.vm[0] (remote-exec):   SSH Agent: false
google_compute_instance.vm[0] (remote-exec):   Checking Host Key: false
google_compute_instance.vm[0] (remote-exec): Connected!

Error when running:

Operation completed over 43 objects/273.8 MiB.                                   
2020-11-02 15:34:55.974295: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-02 15:34:56.007081: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-11-02 15:34:56.007139: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cfurtado-necstlab-0): /proc/driver/nvidia/version does not exist
2020-11-02 15:34:56.007862: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-02 15:34:56.037040: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2300000000 Hz
2020-11-02 15:34:56.037485: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4a4ab60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-02 15:34:56.037552: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Model: "functional_1"

nvidia-smi not detected:

$nvidia-smi 
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

checking stuff

cuda version - ok

$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

ubunto version - ok

$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic

python version

$ python3 --version
Python 3.6.9

cudnn version - ok 7.6.5:

$ cat /usr/include/x86_64-linux-gnu/cudnn_v*.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
CarolinaFurtado commented 3 years ago

from https://medium.com/@stephengregory_69986/installing-cuda-10-1-on-ubuntu-20-04-e562a5e724a0

ubunto version -2004

 boot_disk {
    initialize_params {
      image = "projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20201028"
      size = "${var.hard_drive_size_gp}"
      type = "pd-ssd"
    }
  }

cuda and cudnn versions

sudo apt-get update

sudo apt-get install -y build-essential

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update

sudo apt-get -y install cuda-10.1
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}

# install cudnn
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb

wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb

creating the VM

google_compute_instance.vm[1]: Creating...
google_compute_instance.vm[1]: Still creating... [10s elapsed]
google_compute_instance.vm[1]: Still creating... [20s elapsed]
google_compute_instance.vm[1]: Still creating... [30s elapsed]
google_compute_instance.vm[1]: Provisioning with 'file'...
google_compute_instance.vm[1]: Still creating... [40s elapsed]
google_compute_instance.vm[1]: Still creating... [50s elapsed]
google_compute_instance.vm[1]: Still creating... [1m0s elapsed]
google_compute_instance.vm[1]: Provisioning with 'remote-exec'...
google_compute_instance.vm[1] (remote-exec): Connecting to remote host via SSH...
google_compute_instance.vm[1] (remote-exec):   Host: 34.74.110.244
google_compute_instance.vm[1] (remote-exec):   User: cfurtado
google_compute_instance.vm[1] (remote-exec):   Password: false
google_compute_instance.vm[1] (remote-exec):   Private key: true
google_compute_instance.vm[1] (remote-exec):   Certificate: false
google_compute_instance.vm[1] (remote-exec):   SSH Agent: false
google_compute_instance.vm[1] (remote-exec):   Checking Host Key: false
google_compute_instance.vm[1] (remote-exec): Connected!
google_compute_instance.vm[1] (remote-exec): Running resource creation script... (this may take 10+ minutes)
google_compute_instance.vm[1]: Still creating... [1m10s elapsed]
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 73%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[1]: Still creating... [1m20s elapsed]
google_compute_instance.vm[1]: Still creating... [1m30s elapsed]
google_compute_instance.vm[1] (remote-exec): --2020-11-02 17:26:59--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
google_compute_instance.vm[1] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[1] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
google_compute_instance.vm[1] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[1] (remote-exec): Length: 190 [application/octet-stream]
google_compute_instance.vm[1] (remote-exec): Saving to: ‘cuda-ubuntu1804.pin’

google_compute_instance.vm[1] (remote-exec):       cuda-   0%       0  --.-KB/s
google_compute_instance.vm[1] (remote-exec): cuda-ubuntu 100%     190  --.-KB/s    in 0s

google_compute_instance.vm[1] (remote-exec): 2020-11-02 17:26:59 (7.35 MB/s) - ‘cuda-ubuntu1804.pin’ saved [190/190]

google_compute_instance.vm[1] (remote-exec): --2020-11-02 17:27:00--  http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
google_compute_instance.vm[1] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[1] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[1] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[1] (remote-exec): Length: 1859785444 (1.7G) [application/x-deb]
google_compute_instance.vm[1] (remote-exec): Saving to: ‘cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb’

google_compute_instance.vm[1] (remote-exec):       cuda-   0%       0  --.-KB/s
google_compute_instance.vm[1] (remote-exec):      cuda-r   2%  38.98M   195MB/s
google_compute_instance.vm[1] (remote-exec):     cuda-re   5%  96.70M   242MB/s
google_compute_instance.vm[1] (remote-exec):    cuda-rep   8% 154.87M   258MB/s
google_compute_instance.vm[1] (remote-exec):   cuda-repo  11% 212.24M   265MB/s
google_compute_instance.vm[1] (remote-exec):  cuda-repo-  15% 268.35M   268MB/s
google_compute_instance.vm[1] (remote-exec): cuda-repo-u  18% 321.65M   268MB/s
google_compute_instance.vm[1] (remote-exec): uda-repo-ub  21% 379.05M   271MB/s
google_compute_instance.vm[1] (remote-exec): da-repo-ubu  24% 436.84M   273MB/s
google_compute_instance.vm[1] (remote-exec): a-repo-ubun  27% 494.44M   275MB/s
google_compute_instance.vm[1] (remote-exec): -repo-ubunt  31% 550.39M   275MB/s
google_compute_instance.vm[1] (remote-exec): repo-ubuntu  34% 608.26M   276MB/s
google_compute_instance.vm[1] (remote-exec): epo-ubuntu1  37% 665.78M   277MB/s
google_compute_instance.vm[1]: Still creating... [1m40s elapsed]
google_compute_instance.vm[1] (remote-exec): po-ubuntu18  40% 723.61M   278MB/s
google_compute_instance.vm[1] (remote-exec): o-ubuntu180  44% 780.44M   279MB/s
google_compute_instance.vm[1] (remote-exec): -ubuntu1804  47% 837.91M   279MB/s    eta 3s
google_compute_instance.vm[1] (remote-exec): ubuntu1804-  50% 895.65M   286MB/s    eta 3s
google_compute_instance.vm[1] (remote-exec): buntu1804-1  53% 953.15M   285MB/s    eta 3s
google_compute_instance.vm[1] (remote-exec): untu1804-10  57%   1011M   285MB/s    eta 3s
google_compute_instance.vm[1] (remote-exec): ntu1804-10-  60%   1.04G   286MB/s    eta 3s
google_compute_instance.vm[1] (remote-exec): tu1804-10-1  63%   1.10G   286MB/s    eta 2s
google_compute_instance.vm[1] (remote-exec): u1804-10-1-  66%   1.16G   287MB/s    eta 2s
google_compute_instance.vm[1] (remote-exec): 1804-10-1-l  69%   1.21G   287MB/s    eta 2s
google_compute_instance.vm[1] (remote-exec): 804-10-1-lo  73%   1.27G   287MB/s    eta 2s
google_compute_instance.vm[1] (remote-exec): 04-10-1-loc  76%   1.32G   287MB/s    eta 2s
google_compute_instance.vm[1] (remote-exec): 4-10-1-loca  79%   1.38G   287MB/s    eta 1s
google_compute_instance.vm[1] (remote-exec): -10-1-local  82%   1.43G   287MB/s    eta 1s
google_compute_instance.vm[1] (remote-exec): 10-1-local-  85%   1.49G   285MB/s    eta 1s
google_compute_instance.vm[1] (remote-exec): 0-1-local-1  88%   1.54G   284MB/s    eta 1s
google_compute_instance.vm[1] (remote-exec): -1-local-10  91%   1.59G   284MB/s    eta 1s
google_compute_instance.vm[1] (remote-exec): 1-local-10.  95%   1.65G   283MB/s    eta 0s
google_compute_instance.vm[1] (remote-exec): -local-10.1  98%   1.70G   283MB/s    eta 0s
google_compute_instance.vm[1] (remote-exec): cuda-repo-u 100%   1.73G   283MB/s    in 6.3s

google_compute_instance.vm[1] (remote-exec): 2020-11-02 17:27:06 (281 MB/s) - ‘cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb’ saved [1859785444/1859785444]

google_compute_instance.vm[1]: Still creating... [1m50s elapsed]
google_compute_instance.vm[1] (remote-exec): Warning: apt-key output should not be parsed (stdout is not a terminal)
google_compute_instance.vm[1]: Still creating... [2m0s elapsed]
google_compute_instance.vm[1]: Still creating... [2m10s elapsed]
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 4%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 9%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 14%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 19%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 24%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 29%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 34%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 39%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 44%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 49%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 54%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 59%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 63%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 68%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 73%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 78%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 83%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 88%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 93%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 98%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[1]: Still creating... [2m20s elapsed]
google_compute_instance.vm[1]: Still creating... [2m30s elapsed]
google_compute_instance.vm[1]: Still creating... [2m40s elapsed]
google_compute_instance.vm[1]: Still creating... [2m50s elapsed]
google_compute_instance.vm[1]: Still creating... [3m0s elapsed]
google_compute_instance.vm[1]: Still creating... [3m10s elapsed]
google_compute_instance.vm[1]: Still creating... [3m20s elapsed]
google_compute_instance.vm[1]: Still creating... [3m30s elapsed]
google_compute_instance.vm[1]: Still creating... [3m40s elapsed]
google_compute_instance.vm[1]: Still creating... [3m50s elapsed]
google_compute_instance.vm[1]: Still creating... [4m0s elapsed]
google_compute_instance.vm[1]: Still creating... [4m10s elapsed]
google_compute_instance.vm[1]: Still creating... [4m20s elapsed]
google_compute_instance.vm[1]: Still creating... [4m30s elapsed]
google_compute_instance.vm[1]: Still creating... [4m40s elapsed]
google_compute_instance.vm[1]: Still creating... [4m50s elapsed]
google_compute_instance.vm[1]: Still creating... [5m0s elapsed]
google_compute_instance.vm[1]: Still creating... [5m10s elapsed]
google_compute_instance.vm[1]: Still creating... [5m20s elapsed]
google_compute_instance.vm[1]: Still creating... [5m30s elapsed]
google_compute_instance.vm[1]: Still creating... [5m40s elapsed]
google_compute_instance.vm[1]: Still creating... [5m50s elapsed]
google_compute_instance.vm[1] (remote-exec): No apport report written because the error message indicates its a followup error from a previous failure.
google_compute_instance.vm[1] (remote-exec): No apport report written because the error message indicates its a followup error from a previous failure.
google_compute_instance.vm[1] (remote-exec): No apport report written because MaxReports is reached already
google_compute_instance.vm[1]: Still creating... [6m0s elapsed]
google_compute_instance.vm[1] (remote-exec): No apport report written because MaxReports is reached already
google_compute_instance.vm[1] (remote-exec): No apport report written because MaxReports is reached already
google_compute_instance.vm[1]: Still creating... [6m10s elapsed]
google_compute_instance.vm[1]: Still creating... [6m20s elapsed]
google_compute_instance.vm[1]: Still creating... [6m30s elapsed]
google_compute_instance.vm[1] (remote-exec): E: Sub-process /usr/bin/dpkg returned an error code (1)
google_compute_instance.vm[1] (remote-exec): --2020-11-02 17:31:55--  http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
google_compute_instance.vm[1] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[1] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[1] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[1] (remote-exec): Length: 182313188 (174M) [application/x-deb]
google_compute_instance.vm[1] (remote-exec): Saving to: ‘libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb’

google_compute_instance.vm[1] (remote-exec):       libcu   0%       0  --.-KB/s
google_compute_instance.vm[1] (remote-exec):      libcud  19%  34.65M   173MB/s
google_compute_instance.vm[1] (remote-exec):     libcudn  53%  92.31M   231MB/s
google_compute_instance.vm[1] (remote-exec):    libcudnn  86% 150.05M   250MB/s
google_compute_instance.vm[1] (remote-exec): libcudnn7_7 100% 173.87M   255MB/s    in 0.7s

google_compute_instance.vm[1] (remote-exec): 2020-11-02 17:31:56 (255 MB/s) - ‘libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb’ saved [182313188/182313188]

google_compute_instance.vm[1]: Still creating... [6m40s elapsed]
google_compute_instance.vm[1]: Still creating... [6m50s elapsed]
google_compute_instance.vm[1] (remote-exec): --2020-11-02 17:32:14--  http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
google_compute_instance.vm[1] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[1] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[1] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[1] (remote-exec): Length: 160506208 (153M) [application/x-deb]
google_compute_instance.vm[1] (remote-exec): Saving to: ‘libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb’

google_compute_instance.vm[1] (remote-exec):       libcu   0%       0  --.-KB/s
google_compute_instance.vm[1] (remote-exec):      libcud  24%  37.56M   188MB/s
google_compute_instance.vm[1] (remote-exec):     libcudn  61%  94.50M   236MB/s
google_compute_instance.vm[1] (remote-exec):    libcudnn  99% 151.87M   253MB/s
google_compute_instance.vm[1] (remote-exec): libcudnn7-d 100% 153.07M   253MB/s    in 0.6s

google_compute_instance.vm[1] (remote-exec): 2020-11-02 17:32:15 (253 MB/s) - ‘libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb’ saved [160506208/160506208]

google_compute_instance.vm[1]: Still creating... [7m0s elapsed]
google_compute_instance.vm[1]: Still creating... [7m10s elapsed]
google_compute_instance.vm[1]: Still creating... [7m20s elapsed]
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 14%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 28%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 42%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 56%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 70%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 84%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 98%
google_compute_instance.vm[1] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[1]: Still creating... [7m30s elapsed]
google_compute_instance.vm[1]: Still creating... [7m40s elapsed]
google_compute_instance.vm[1]: Still creating... [7m50s elapsed]
google_compute_instance.vm[1]: Still creating... [8m0s elapsed]
google_compute_instance.vm[1] (remote-exec): No apport report written because the error message indicates its a followup error from a previous failure.
google_compute_instance.vm[1] (remote-exec): No apport report written because the error message indicates its a followup error from a previous failure.
google_compute_instance.vm[1] (remote-exec): No apport report written because MaxReports is reached already
google_compute_instance.vm[1] (remote-exec): No apport report written because MaxReports is reached already
google_compute_instance.vm[1] (remote-exec): No apport report written because MaxReports is reached already
google_compute_instance.vm[1]: Still creating... [8m10s elapsed]
google_compute_instance.vm[1]: Still creating... [8m20s elapsed]
google_compute_instance.vm[1] (remote-exec): E: Sub-process /usr/bin/dpkg returned an error code (1)
google_compute_instance.vm[1] (remote-exec):   WARNING: The scripts pip, pip3 and pip3.8 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[1] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[1] (remote-exec): ERROR: launchpadlib 1.10.13 requires testresources, which is not installed.
google_compute_instance.vm[1] (remote-exec):   WARNING: The scripts easy_install and easy_install-3.8 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[1] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[1] (remote-exec): WARNING: Skipping crcmod as it is not installed.
google_compute_instance.vm[1]: Still creating... [8m30s elapsed]
google_compute_instance.vm[1]: Still creating... [8m40s elapsed]
google_compute_instance.vm[1]: Still creating... [8m50s elapsed]
google_compute_instance.vm[1]: Still creating... [9m0s elapsed]
google_compute_instance.vm[1] (remote-exec):   WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', OSError("(104, 'ECONNRESET')"))': /packages/bc/58/0aa6fb779dc69cfc811df3398fcbeaeefbf18561b6e36b185df0782781cc/absl_py-0.11.0-py3-none-any.whl
google_compute_instance.vm[1]: Still creating... [9m10s elapsed]
google_compute_instance.vm[1]: Still creating... [9m20s elapsed]
google_compute_instance.vm[1]: Still creating... [9m30s elapsed]
google_compute_instance.vm[1]: Still creating... [9m40s elapsed]
google_compute_instance.vm[1]: Still creating... [9m50s elapsed]
google_compute_instance.vm[1]: Still creating... [10m0s elapsed]
google_compute_instance.vm[1]: Still creating... [10m10s elapsed]
google_compute_instance.vm[1]: Still creating... [10m20s elapsed]
google_compute_instance.vm[1]: Still creating... [10m30s elapsed]
google_compute_instance.vm[1]: Still creating... [10m40s elapsed]
google_compute_instance.vm[1]: Still creating... [10m50s elapsed]
google_compute_instance.vm[1]: Still creating... [11m0s elapsed]
google_compute_instance.vm[1]: Still creating... [11m10s elapsed]
google_compute_instance.vm[1]: Still creating... [11m20s elapsed]
google_compute_instance.vm[1]: Still creating... [11m30s elapsed]
google_compute_instance.vm[1]: Still creating... [11m40s elapsed]
google_compute_instance.vm[1]: Still creating... [11m50s elapsed]
google_compute_instance.vm[1]: Still creating... [12m0s elapsed]
google_compute_instance.vm[1]: Still creating... [12m10s elapsed]
google_compute_instance.vm[1]: Still creating... [12m20s elapsed]
google_compute_instance.vm[1]: Still creating... [12m30s elapsed]
google_compute_instance.vm[1]: Still creating... [12m40s elapsed]
google_compute_instance.vm[1]: Still creating... [12m50s elapsed]
google_compute_instance.vm[1]: Still creating... [13m0s elapsed]
google_compute_instance.vm[1]: Still creating... [13m10s elapsed]
google_compute_instance.vm[1]: Still creating... [13m20s elapsed]
google_compute_instance.vm[1]: Still creating... [13m30s elapsed]
google_compute_instance.vm[1]: Still creating... [13m40s elapsed]
google_compute_instance.vm[1]: Still creating... [13m50s elapsed]
google_compute_instance.vm[1]: Still creating... [14m0s elapsed]
google_compute_instance.vm[1]: Still creating... [14m10s elapsed]
google_compute_instance.vm[1]: Still creating... [14m20s elapsed]
google_compute_instance.vm[1]: Still creating... [14m30s elapsed]
google_compute_instance.vm[1]: Still creating... [14m40s elapsed]
google_compute_instance.vm[1]: Still creating... [14m50s elapsed]
google_compute_instance.vm[1]: Still creating... [15m0s elapsed]
google_compute_instance.vm[1]: Still creating... [15m10s elapsed]
google_compute_instance.vm[1]: Still creating... [15m20s elapsed]
google_compute_instance.vm[1]: Still creating... [15m30s elapsed]
google_compute_instance.vm[1]: Still creating... [15m40s elapsed]
google_compute_instance.vm[1]: Still creating... [15m50s elapsed]
google_compute_instance.vm[1]: Still creating... [16m0s elapsed]
google_compute_instance.vm[1]: Still creating... [16m10s elapsed]
google_compute_instance.vm[1]: Still creating... [16m20s elapsed]
google_compute_instance.vm[1]: Still creating... [16m30s elapsed]
....... continues forever

nvidia-smi not detected:

$nvidia-smi 
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

checking stuff

cuda version - ok

$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

ubunto version - ok

$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:        20.04
Codename:       focal

python version

$ python3 --version
Python 3.8.2

cudnn version - ok 7.6.5:

$ cat /usr/include/x86_64-linux-gnu/cudnn_v*.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
CarolinaFurtado commented 3 years ago

following https://www.tensorflow.org/install/gpu

ubunto version -1804

 boot_disk {
    initialize_params {
      image = "projects/ubuntu-os-cloud/global/images/ubuntu-1804-bionic-v20201014"
      size = "${var.hard_drive_size_gp}"
      type = "pd-ssd"
    }
  }

cuda and cudnn versions

sudo apt-get update

sudo apt-get install -y build-essential
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
sudo apt-get -y install --no-install-recommends nvidia-driver-450
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get -y install --no-install-recommends \
    cuda-10-1 \
    libcudnn7=7.6.5.32-1+cuda10.1  \
    libcudnn7-dev=7.6.5.32-1+cuda10.1

creating the VM

google_compute_instance.vm[2]: Creating...
google_compute_instance.vm[2]: Still creating... [10s elapsed]
google_compute_instance.vm[2]: Still creating... [20s elapsed]
google_compute_instance.vm[2]: Still creating... [30s elapsed]
google_compute_instance.vm[2]: Provisioning with 'file'...
google_compute_instance.vm[2]: Still creating... [40s elapsed]
google_compute_instance.vm[2]: Still creating... [50s elapsed]
google_compute_instance.vm[2]: Still creating... [1m0s elapsed]
google_compute_instance.vm[2]: Provisioning with 'remote-exec'...
google_compute_instance.vm[2] (remote-exec): Connecting to remote host via SSH...
google_compute_instance.vm[2] (remote-exec):   Host: 35.231.39.6
google_compute_instance.vm[2] (remote-exec):   User: cfurtado
google_compute_instance.vm[2] (remote-exec):   Password: false
google_compute_instance.vm[2] (remote-exec):   Private key: true
google_compute_instance.vm[2] (remote-exec):   Certificate: false
google_compute_instance.vm[2] (remote-exec):   SSH Agent: false
google_compute_instance.vm[2] (remote-exec):   Checking Host Key: false
google_compute_instance.vm[2] (remote-exec): Connected!
google_compute_instance.vm[2] (remote-exec): Running resource creation script... (this may take 10+ minutes)
google_compute_instance.vm[2]: Still creating... [1m10s elapsed]
google_compute_instance.vm[2] (remote-exec): E: Package 'build-essential' has no installation candidate
google_compute_instance.vm[2] (remote-exec): --2020-11-02 18:12:23--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
google_compute_instance.vm[2] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[2] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
google_compute_instance.vm[2] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[2] (remote-exec): Length: 2936 (2.9K) [application/x-deb]
google_compute_instance.vm[2] (remote-exec): Saving to: ‘cuda-repo-ubuntu1804_10.1.243-1_amd64.deb’

google_compute_instance.vm[2] (remote-exec):       cuda-   0%       0  --.-KB/s
google_compute_instance.vm[2] (remote-exec): cuda-repo-u 100%   2.87K  --.-KB/s    in 0s

google_compute_instance.vm[2] (remote-exec): 2020-11-02 18:12:24 (171 MB/s) - ‘cuda-repo-ubuntu1804_10.1.243-1_amd64.deb’ saved [2936/2936]

google_compute_instance.vm[2] (remote-exec): Warning: apt-key output should not be parsed (stdout is not a terminal)
google_compute_instance.vm[2] (remote-exec): gpg: requesting key from 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub'
google_compute_instance.vm[2] (remote-exec): gpg: key F60F4B3D7FA2AF80: public key "cudatools <cudatools@nvidia.com>" imported
google_compute_instance.vm[2] (remote-exec): gpg: Total number processed: 1
google_compute_instance.vm[2] (remote-exec): gpg:               imported: 1
google_compute_instance.vm[2]: Still creating... [1m20s elapsed]
google_compute_instance.vm[2] (remote-exec): --2020-11-02 18:12:30--  http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
google_compute_instance.vm[2] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[2] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[2] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[2] (remote-exec): Length: 2926 (2.9K) [application/x-deb]
google_compute_instance.vm[2] (remote-exec): Saving to: ‘nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb’

google_compute_instance.vm[2] (remote-exec):       nvidi   0%       0  --.-KB/s
google_compute_instance.vm[2] (remote-exec): nvidia-mach 100%   2.86K  --.-KB/s    in 0s

google_compute_instance.vm[2] (remote-exec): 2020-11-02 18:12:31 (456 MB/s) - ‘nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb’ saved [2926/2926]

google_compute_instance.vm[2] (remote-exec): WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 33%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 67%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[2]: Still creating... [1m30s elapsed]
google_compute_instance.vm[2]: Still creating... [1m40s elapsed]
google_compute_instance.vm[2]: Still creating... [1m50s elapsed]
google_compute_instance.vm[2]: Still creating... [2m0s elapsed]
google_compute_instance.vm[2]: Still creating... [2m10s elapsed]
google_compute_instance.vm[2]: Still creating... [2m20s elapsed]
google_compute_instance.vm[2]: Still creating... [2m30s elapsed]
google_compute_instance.vm[2]: Still creating... [2m40s elapsed]
google_compute_instance.vm[2]: Still creating... [2m50s elapsed]
google_compute_instance.vm[2]: Still creating... [3m1s elapsed]
google_compute_instance.vm[2]: Still creating... [3m11s elapsed]
google_compute_instance.vm[2]: Still creating... [3m21s elapsed]
google_compute_instance.vm[2]: Still creating... [3m31s elapsed]
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 14%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 28%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 42%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 56%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 71%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 85%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 99%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[2]: Still creating... [3m41s elapsed]
google_compute_instance.vm[2]: Still creating... [3m51s elapsed]
google_compute_instance.vm[2]: Still creating... [4m1s elapsed]
google_compute_instance.vm[2]: Still creating... [4m11s elapsed]
google_compute_instance.vm[2]: Still creating... [4m21s elapsed]
google_compute_instance.vm[2]: Still creating... [4m31s elapsed]
google_compute_instance.vm[2]: Still creating... [4m41s elapsed]
google_compute_instance.vm[2]: Still creating... [4m51s elapsed]
google_compute_instance.vm[2]: Still creating... [5m1s elapsed]
google_compute_instance.vm[2]: Still creating... [5m11s elapsed]
google_compute_instance.vm[2]: Still creating... [5m21s elapsed]
google_compute_instance.vm[2]: Still creating... [5m31s elapsed]
google_compute_instance.vm[2]: Still creating... [5m41s elapsed]
google_compute_instance.vm[2]: Still creating... [5m51s elapsed]
google_compute_instance.vm[2]: Still creating... [6m1s elapsed]
google_compute_instance.vm[2]: Still creating... [6m11s elapsed]
google_compute_instance.vm[2]: Still creating... [6m21s elapsed]
google_compute_instance.vm[2]: Still creating... [6m31s elapsed]
google_compute_instance.vm[2]: Still creating... [6m41s elapsed]
google_compute_instance.vm[2]: Still creating... [6m51s elapsed]
google_compute_instance.vm[2]: Still creating... [7m1s elapsed]
google_compute_instance.vm[2]: Still creating... [7m11s elapsed]
google_compute_instance.vm[2]: Still creating... [7m21s elapsed]
google_compute_instance.vm[2]: Still creating... [7m31s elapsed]
google_compute_instance.vm[2]: Still creating... [7m41s elapsed]
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 13%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 26%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 40%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 53%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 66%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 80%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 93%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[2]: Still creating... [7m51s elapsed]
google_compute_instance.vm[2]: Still creating... [8m1s elapsed]
google_compute_instance.vm[2]: Still creating... [8m11s elapsed]
google_compute_instance.vm[2]: Still creating... [8m21s elapsed]
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts easy_install and easy_install-3.6 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2] (remote-exec): WARNING: Skipping crcmod as it is not installed.
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2]: Still creating... [8m31s elapsed]
google_compute_instance.vm[2]: Still creating... [8m41s elapsed]
google_compute_instance.vm[2]: Still creating... [8m51s elapsed]
google_compute_instance.vm[2]: Still creating... [9m1s elapsed]
google_compute_instance.vm[2]: Still creating... [9m11s elapsed]
google_compute_instance.vm[2]: Still creating... [9m21s elapsed]
google_compute_instance.vm[2]: Still creating... [9m31s elapsed]
google_compute_instance.vm[2]: Still creating... [9m41s elapsed]
google_compute_instance.vm[2]: Still creating... [9m51s elapsed]
google_compute_instance.vm[2]: Still creating... [10m1s elapsed]
google_compute_instance.vm[2]: Still creating... [10m11s elapsed]
google_compute_instance.vm[2]: Still creating... [10m21s elapsed]
google_compute_instance.vm[2]: Still creating... [10m31s elapsed]
google_compute_instance.vm[2]: Still creating... [10m41s elapsed]
google_compute_instance.vm[2]: Still creating... [10m51s elapsed]
google_compute_instance.vm[2]: Still creating... [11m1s elapsed]
google_compute_instance.vm[2]: Still creating... [11m11s elapsed]
google_compute_instance.vm[2]: Still creating... [11m21s elapsed]
google_compute_instance.vm[2]: Still creating... [11m31s elapsed]
google_compute_instance.vm[2]: Still creating... [11m41s elapsed]
google_compute_instance.vm[2]: Still creating... [11m51s elapsed]
google_compute_instance.vm[2]: Still creating... [12m1s elapsed]
google_compute_instance.vm[2]: Still creating... [12m11s elapsed]
google_compute_instance.vm[2]: Still creating... [12m21s elapsed]
google_compute_instance.vm[2]: Still creating... [12m31s elapsed]
google_compute_instance.vm[2]: Still creating... [12m41s elapsed]
google_compute_instance.vm[2]: Still creating... [12m51s elapsed]
google_compute_instance.vm[2]: Still creating... [13m1s elapsed]
google_compute_instance.vm[2]: Still creating... [13m11s elapsed]
google_compute_instance.vm[2]: Still creating... [13m21s elapsed]
google_compute_instance.vm[2]: Still creating... [13m31s elapsed]
google_compute_instance.vm[2]: Still creating... [13m41s elapsed]
google_compute_instance.vm[2]: Still creating... [13m51s elapsed]
google_compute_instance.vm[2]: Still creating... [14m1s elapsed]
google_compute_instance.vm[2]: Still creating... [14m11s elapsed]
google_compute_instance.vm[2]: Still creating... [14m21s elapsed]
google_compute_instance.vm[2]: Still creating... [14m31s elapsed]
google_compute_instance.vm[2]: Still creating... [14m41s elapsed]
google_compute_instance.vm[2]: Still creating... [14m51s elapsed]
google_compute_instance.vm[2]: Still creating... [15m1s elapsed]
google_compute_instance.vm[2]: Still creating... [15m11s elapsed]
google_compute_instance.vm[2]: Still creating... [15m21s elapsed]
google_compute_instance.vm[2]: Still creating... [15m31s elapsed]
google_compute_instance.vm[2]: Still creating... [15m41s elapsed]
google_compute_instance.vm[2]: Still creating... [15m51s elapsed]
google_compute_instance.vm[2]: Still creating... [16m1s elapsed]
google_compute_instance.vm[2]: Still creating... [16m11s elapsed]

.... does not finish......

Error when running:

stuck while installing the python packages does not run

nvidia-smi: installing cuda 11.1?????

$nvidia-smi 

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    29W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

checking stuff

cuda version - ok

$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

ubunto version - ok

$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic

python version

$ python3 --version
Python 3.6.9

cudnn version - ok 7.6.5:

$ cat /usr/include/x86_64-linux-gnu/cudnn_v*.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
CarolinaFurtado commented 3 years ago

following https://github.com/tensorflow/tensorflow/issues/43236 - use cuda 10.2

ubunto version -1804

 boot_disk {
    initialize_params {
      image = "projects/ubuntu-os-cloud/global/images/ubuntu-1804-bionic-v20201014"
      size = "${var.hard_drive_size_gp}"
      type = "pd-ssd"
    }
  }

cuda and cudnn versions - cuda 10.2!

sudo apt-get update

sudo apt-get install -y build-essential

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}

# install cudnn
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb

wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb

creating the VM - 33min!!!!

google_compute_instance.vm[2]: Creating...
google_compute_instance.vm[2]: Still creating... [10s elapsed]
google_compute_instance.vm[2]: Still creating... [20s elapsed]
google_compute_instance.vm[2]: Still creating... [30s elapsed]
google_compute_instance.vm[2]: Provisioning with 'file'...
google_compute_instance.vm[2]: Still creating... [40s elapsed]
google_compute_instance.vm[2]: Still creating... [50s elapsed]
google_compute_instance.vm[2]: Still creating... [1m0s elapsed]
google_compute_instance.vm[2]: Still creating... [1m10s elapsed]
google_compute_instance.vm[2]: Still creating... [1m20s elapsed]
google_compute_instance.vm[2]: Provisioning with 'remote-exec'...
google_compute_instance.vm[2] (remote-exec): Connecting to remote host via SSH...
google_compute_instance.vm[2] (remote-exec):   Host: 35.231.39.6
google_compute_instance.vm[2] (remote-exec):   User: cfurtado
google_compute_instance.vm[2] (remote-exec):   Password: false
google_compute_instance.vm[2] (remote-exec):   Private key: true
google_compute_instance.vm[2] (remote-exec):   Certificate: false
google_compute_instance.vm[2] (remote-exec):   SSH Agent: false
google_compute_instance.vm[2] (remote-exec):   Checking Host Key: false
google_compute_instance.vm[2] (remote-exec): Connected!
google_compute_instance.vm[2] (remote-exec): Running resource creation script... (this may take 10+ minutes)
google_compute_instance.vm[2]: Still creating... [1m30s elapsed]
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 75%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[2]: Still creating... [1m40s elapsed]
google_compute_instance.vm[2]: Still creating... [1m50s elapsed]
google_compute_instance.vm[2] (remote-exec): --2020-11-02 19:26:36--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
google_compute_instance.vm[2] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[2] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
google_compute_instance.vm[2] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[2] (remote-exec): Length: 190 [application/octet-stream]
google_compute_instance.vm[2] (remote-exec): Saving to: ‘cuda-ubuntu1804.pin’

google_compute_instance.vm[2] (remote-exec):       cuda-   0%       0  --.-KB/s
google_compute_instance.vm[2] (remote-exec): cuda-ubuntu 100%     190  --.-KB/s    in 0s

google_compute_instance.vm[2] (remote-exec): 2020-11-02 19:26:37 (5.87 MB/s) - ‘cuda-ubuntu1804.pin’ saved [190/190]

google_compute_instance.vm[2] (remote-exec): --2020-11-02 19:26:37--  http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
google_compute_instance.vm[2] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[2] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[2] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[2] (remote-exec): Length: 1896270068 (1.8G) [application/x-deb]
google_compute_instance.vm[2] (remote-exec): Saving to: ‘cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb’

google_compute_instance.vm[2] (remote-exec):       cuda-   0%       0  --.-KB/s
google_compute_instance.vm[2] (remote-exec):      cuda-r   2%  36.79M   184MB/s
google_compute_instance.vm[2] (remote-exec):     cuda-re   5%  92.98M   232MB/s
google_compute_instance.vm[2] (remote-exec):    cuda-rep   8% 149.28M   249MB/s
google_compute_instance.vm[2] (remote-exec):   cuda-repo  11% 204.85M   256MB/s
google_compute_instance.vm[2] (remote-exec):  cuda-repo-  14% 261.52M   261MB/s
google_compute_instance.vm[2] (remote-exec): cuda-repo-u  17% 318.06M   265MB/s
google_compute_instance.vm[2] (remote-exec): uda-repo-ub  20% 374.38M   267MB/s
google_compute_instance.vm[2] (remote-exec): da-repo-ubu  23% 431.60M   270MB/s
google_compute_instance.vm[2] (remote-exec): a-repo-ubun  26% 487.65M   271MB/s
google_compute_instance.vm[2] (remote-exec): -repo-ubunt  30% 543.90M   272MB/s
google_compute_instance.vm[2] (remote-exec): repo-ubuntu  33% 600.77M   273MB/s
google_compute_instance.vm[2] (remote-exec): epo-ubuntu1  36% 657.77M   274MB/s
google_compute_instance.vm[2] (remote-exec): po-ubuntu18  39% 715.06M   275MB/s
google_compute_instance.vm[2] (remote-exec): o-ubuntu180  42% 772.61M   276MB/s
google_compute_instance.vm[2] (remote-exec): -ubuntu1804  45% 829.26M   276MB/s    eta 4s
google_compute_instance.vm[2] (remote-exec): ubuntu1804-  49% 886.73M   283MB/s    eta 4s
google_compute_instance.vm[2] (remote-exec): buntu1804-1  52% 943.84M   283MB/s    eta 4s
google_compute_instance.vm[2] (remote-exec): untu1804-10  55%   1001M   284MB/s    eta 4s
google_compute_instance.vm[2] (remote-exec): ntu1804-10-  58%   1.03G   283MB/s    eta 4s
google_compute_instance.vm[2] (remote-exec): tu1804-10-2  61%   1.09G   284MB/s    eta 3s
google_compute_instance.vm[2] (remote-exec): u1804-10-2-  64%   1.14G   284MB/s    eta 3s
google_compute_instance.vm[2] (remote-exec): 1804-10-2-l  67%   1.20G   284MB/s    eta 3s
google_compute_instance.vm[2] (remote-exec): 804-10-2-lo  70%   1.25G   284MB/s    eta 3s
google_compute_instance.vm[2] (remote-exec): 04-10-2-loc  74%   1.31G   284MB/s    eta 3s
google_compute_instance.vm[2] (remote-exec): 4-10-2-loca  77%   1.36G   284MB/s    eta 1s
google_compute_instance.vm[2] (remote-exec): -10-2-local  80%   1.42G   284MB/s    eta 1s
google_compute_instance.vm[2] (remote-exec): 10-2-local-  83%   1.47G   284MB/s    eta 1s
google_compute_instance.vm[2] (remote-exec): 0-2-local-1  86%   1.53G   284MB/s    eta 1s
google_compute_instance.vm[2] (remote-exec): -2-local-10  89%   1.59G   284MB/s    eta 1s
google_compute_instance.vm[2] (remote-exec): 2-local-10.  92%   1.64G   284MB/s    eta 0s
google_compute_instance.vm[2] (remote-exec): -local-10.2  96%   1.70G   284MB/s    eta 0s
google_compute_instance.vm[2] (remote-exec): local-10.2.  99%   1.75G   284MB/s    eta 0s
google_compute_instance.vm[2] (remote-exec): cuda-repo-u 100%   1.77G   284MB/s    in 6.4s

google_compute_instance.vm[2] (remote-exec): 2020-11-02 19:26:43 (280 MB/s) - ‘cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb’ saved [1896270068/1896270068]

google_compute_instance.vm[2]: Still creating... [2m0s elapsed]
google_compute_instance.vm[2]: Still creating... [2m10s elapsed]
google_compute_instance.vm[2] (remote-exec): Warning: apt-key output should not be parsed (stdout is not a terminal)
google_compute_instance.vm[2]: Still creating... [2m20s elapsed]
google_compute_instance.vm[2]: Still creating... [2m30s elapsed]
google_compute_instance.vm[2]: Still creating... [2m40s elapsed]
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 11%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 22%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 33%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 44%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 55%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 66%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 77%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 88%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[2]: Still creating... [2m50s elapsed]
google_compute_instance.vm[2]: Still creating... [3m0s elapsed]
google_compute_instance.vm[2]: Still creating... [3m10s elapsed]
google_compute_instance.vm[2]: Still creating... [3m20s elapsed]
google_compute_instance.vm[2]: Still creating... [3m30s elapsed]
google_compute_instance.vm[2]: Still creating... [3m40s elapsed]
google_compute_instance.vm[2]: Still creating... [3m50s elapsed]
google_compute_instance.vm[2]: Still creating... [4m0s elapsed]
google_compute_instance.vm[2]: Still creating... [4m10s elapsed]
google_compute_instance.vm[2]: Still creating... [4m20s elapsed]
google_compute_instance.vm[2]: Still creating... [4m30s elapsed]
google_compute_instance.vm[2]: Still creating... [4m40s elapsed]
google_compute_instance.vm[2]: Still creating... [4m50s elapsed]
google_compute_instance.vm[2]: Still creating... [5m0s elapsed]
google_compute_instance.vm[2]: Still creating... [5m10s elapsed]
google_compute_instance.vm[2]: Still creating... [5m20s elapsed]
google_compute_instance.vm[2]: Still creating... [5m30s elapsed]
google_compute_instance.vm[2]: Still creating... [5m40s elapsed]
google_compute_instance.vm[2]: Still creating... [5m50s elapsed]
google_compute_instance.vm[2]: Still creating... [6m0s elapsed]
google_compute_instance.vm[2]: Still creating... [6m10s elapsed]
google_compute_instance.vm[2]: Still creating... [6m20s elapsed]
google_compute_instance.vm[2]: Still creating... [6m30s elapsed]
google_compute_instance.vm[2]: Still creating... [6m40s elapsed]
google_compute_instance.vm[2]: Still creating... [6m50s elapsed]
google_compute_instance.vm[2]: Still creating... [7m0s elapsed]
google_compute_instance.vm[2] (remote-exec): --2020-11-02 19:31:46--  http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
google_compute_instance.vm[2] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[2] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[2] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[2] (remote-exec): Length: 182313188 (174M) [application/x-deb]
google_compute_instance.vm[2] (remote-exec): Saving to: ‘libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb’

google_compute_instance.vm[2] (remote-exec):       libcu   0%       0  --.-KB/s
google_compute_instance.vm[2] (remote-exec):      libcud  21%  37.74M   189MB/s
google_compute_instance.vm[2] (remote-exec):     libcudn  52%  92.15M   230MB/s
google_compute_instance.vm[2] (remote-exec):    libcudnn  85% 149.10M   248MB/s
google_compute_instance.vm[2] (remote-exec): libcudnn7_7 100% 173.87M   252MB/s    in 0.7s

google_compute_instance.vm[2] (remote-exec): 2020-11-02 19:31:47 (252 MB/s) - ‘libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb’ saved [182313188/182313188]

google_compute_instance.vm[2]: Still creating... [7m10s elapsed]
google_compute_instance.vm[2] (remote-exec): --2020-11-02 19:32:05--  http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
google_compute_instance.vm[2] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[2] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[2]: Still creating... [7m20s elapsed]
google_compute_instance.vm[2] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[2] (remote-exec): Length: 160506208 (153M) [application/x-deb]
google_compute_instance.vm[2] (remote-exec): Saving to: ‘libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb’

google_compute_instance.vm[2] (remote-exec):       libcu   0%       0  --.-KB/s
google_compute_instance.vm[2] (remote-exec):      libcud  25%  38.42M   192MB/s
google_compute_instance.vm[2] (remote-exec):     libcudn  62%  96.05M   240MB/s
google_compute_instance.vm[2] (remote-exec): libcudnn7-d 100% 153.07M   256MB/s    in 0.6s

google_compute_instance.vm[2] (remote-exec): 2020-11-02 19:32:06 (256 MB/s) - ‘libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb’ saved [160506208/160506208]

google_compute_instance.vm[2]: Still creating... [7m30s elapsed]
google_compute_instance.vm[2]: Still creating... [7m40s elapsed]
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 13%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 26%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 40%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 53%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 67%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 80%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 94%
google_compute_instance.vm[2] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[2]: Still creating... [7m50s elapsed]
google_compute_instance.vm[2]: Still creating... [8m0s elapsed]
google_compute_instance.vm[2]: Still creating... [8m10s elapsed]
google_compute_instance.vm[2]: Still creating... [8m20s elapsed]
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts easy_install and easy_install-3.6 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2]: Still creating... [8m30s elapsed]
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2] (remote-exec): WARNING: Skipping crcmod as it is not installed.
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[2] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[2] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[2]: Still creating... [8m40s elapsed]
google_compute_instance.vm[2]: Still creating... [8m50s elapsed]
google_compute_instance.vm[2]: Still creating... [9m0s elapsed]
google_compute_instance.vm[2]: Still creating... [9m10s elapsed]
google_compute_instance.vm[2]: Still creating... [9m20s elapsed]
google_compute_instance.vm[2]: Still creating... [9m30s elapsed]
google_compute_instance.vm[2]: Still creating... [9m40s elapsed]
google_compute_instance.vm[2]: Still creating... [9m50s elapsed]
google_compute_instance.vm[2]: Still creating... [10m0s elapsed]
google_compute_instance.vm[2]: Still creating... [10m10s elapsed]
google_compute_instance.vm[2]: Still creating... [10m20s elapsed]
google_compute_instance.vm[2]: Still creating... [10m30s elapsed]
google_compute_instance.vm[2]: Still creating... [10m40s elapsed]
google_compute_instance.vm[2]: Still creating... [10m50s elapsed]
google_compute_instance.vm[2]: Still creating... [11m0s elapsed]
google_compute_instance.vm[2]: Still creating... [11m10s elapsed]
google_compute_instance.vm[2]: Still creating... [11m20s elapsed]
google_compute_instance.vm[2]: Still creating... [11m30s elapsed]
google_compute_instance.vm[2]: Still creating... [11m40s elapsed]
google_compute_instance.vm[2]: Still creating... [11m50s elapsed]
google_compute_instance.vm[2]: Still creating... [12m0s elapsed]
google_compute_instance.vm[2]: Still creating... [12m10s elapsed]
google_compute_instance.vm[2]: Still creating... [12m20s elapsed]
google_compute_instance.vm[2]: Still creating... [12m30s elapsed]
google_compute_instance.vm[2]: Still creating... [12m40s elapsed]
google_compute_instance.vm[2]: Still creating... [12m50s elapsed]
google_compute_instance.vm[2]: Still creating... [13m0s elapsed]
google_compute_instance.vm[2]: Still creating... [13m10s elapsed]
google_compute_instance.vm[2]: Still creating... [13m20s elapsed]
google_compute_instance.vm[2]: Still creating... [13m30s elapsed]
google_compute_instance.vm[2]: Still creating... [13m40s elapsed]
google_compute_instance.vm[2]: Still creating... [13m50s elapsed]
google_compute_instance.vm[2]: Still creating... [14m0s elapsed]
google_compute_instance.vm[2]: Still creating... [14m10s elapsed]
google_compute_instance.vm[2]: Still creating... [14m20s elapsed]
google_compute_instance.vm[2]: Still creating... [14m30s elapsed]
google_compute_instance.vm[2]: Still creating... [14m40s elapsed]
google_compute_instance.vm[2]: Still creating... [14m50s elapsed]
google_compute_instance.vm[2]: Still creating... [15m0s elapsed]
google_compute_instance.vm[2]: Still creating... [15m10s elapsed]
google_compute_instance.vm[2]: Still creating... [15m20s elapsed]
google_compute_instance.vm[2]: Still creating... [15m30s elapsed]
google_compute_instance.vm[2]: Still creating... [15m40s elapsed]
google_compute_instance.vm[2]: Still creating... [15m50s elapsed]
google_compute_instance.vm[2]: Still creating... [16m0s elapsed]
google_compute_instance.vm[2]: Still creating... [16m10s elapsed]
google_compute_instance.vm[2]: Still creating... [16m20s elapsed]
google_compute_instance.vm[2]: Still creating... [16m30s elapsed]
google_compute_instance.vm[2]: Still creating... [16m40s elapsed]
google_compute_instance.vm[2]: Still creating... [16m50s elapsed]
google_compute_instance.vm[2]: Still creating... [17m0s elapsed]
google_compute_instance.vm[2]: Still creating... [17m10s elapsed]
google_compute_instance.vm[2]: Still creating... [17m20s elapsed]
google_compute_instance.vm[2]: Still creating... [17m30s elapsed]
google_compute_instance.vm[2]: Still creating... [17m40s elapsed]
google_compute_instance.vm[2]: Still creating... [17m50s elapsed]
google_compute_instance.vm[2]: Still creating... [18m0s elapsed]
google_compute_instance.vm[2]: Still creating... [18m10s elapsed]
google_compute_instance.vm[2]: Still creating... [18m20s elapsed]
google_compute_instance.vm[2]: Still creating... [18m30s elapsed]
google_compute_instance.vm[2]: Still creating... [18m40s elapsed]
google_compute_instance.vm[2]: Still creating... [18m50s elapsed]
google_compute_instance.vm[2]: Still creating... [19m0s elapsed]
google_compute_instance.vm[2]: Still creating... [19m10s elapsed]
google_compute_instance.vm[2]: Still creating... [19m20s elapsed]
google_compute_instance.vm[2]: Still creating... [19m30s elapsed]
google_compute_instance.vm[2]: Still creating... [19m40s elapsed]
google_compute_instance.vm[2]: Still creating... [19m50s elapsed]
google_compute_instance.vm[2]: Still creating... [20m0s elapsed]
google_compute_instance.vm[2]: Still creating... [20m10s elapsed]
google_compute_instance.vm[2]: Still creating... [20m20s elapsed]
google_compute_instance.vm[2]: Still creating... [20m30s elapsed]
google_compute_instance.vm[2]: Still creating... [20m40s elapsed]
google_compute_instance.vm[2]: Still creating... [20m50s elapsed]
google_compute_instance.vm[2]: Still creating... [21m0s elapsed]
google_compute_instance.vm[2]: Still creating... [21m10s elapsed]
google_compute_instance.vm[2]: Still creating... [21m20s elapsed]
google_compute_instance.vm[2]: Still creating... [21m30s elapsed]
google_compute_instance.vm[2]: Still creating... [21m40s elapsed]
google_compute_instance.vm[2]: Still creating... [21m50s elapsed]
google_compute_instance.vm[2]: Still creating... [22m0s elapsed]
google_compute_instance.vm[2]: Still creating... [22m10s elapsed]
google_compute_instance.vm[2]: Still creating... [22m20s elapsed]
google_compute_instance.vm[2]: Still creating... [22m30s elapsed]
google_compute_instance.vm[2]: Still creating... [22m40s elapsed]
google_compute_instance.vm[2]: Still creating... [22m50s elapsed]
google_compute_instance.vm[2]: Still creating... [23m0s elapsed]
google_compute_instance.vm[2]: Still creating... [23m10s elapsed]
google_compute_instance.vm[2]: Still creating... [23m20s elapsed]
google_compute_instance.vm[2]: Still creating... [23m30s elapsed]
google_compute_instance.vm[2]: Still creating... [23m40s elapsed]
google_compute_instance.vm[2]: Still creating... [23m50s elapsed]
google_compute_instance.vm[2]: Still creating... [24m0s elapsed]
google_compute_instance.vm[2]: Still creating... [24m10s elapsed]
google_compute_instance.vm[2]: Still creating... [24m20s elapsed]
google_compute_instance.vm[2]: Still creating... [24m30s elapsed]
google_compute_instance.vm[2]: Still creating... [24m40s elapsed]
google_compute_instance.vm[2]: Still creating... [24m50s elapsed]
google_compute_instance.vm[2]: Still creating... [25m0s elapsed]
google_compute_instance.vm[2]: Still creating... [25m10s elapsed]
google_compute_instance.vm[2]: Still creating... [25m20s elapsed]
google_compute_instance.vm[2]: Still creating... [25m30s elapsed]
google_compute_instance.vm[2]: Still creating... [25m40s elapsed]
google_compute_instance.vm[2]: Still creating... [25m50s elapsed]
google_compute_instance.vm[2]: Still creating... [26m0s elapsed]
google_compute_instance.vm[2]: Still creating... [26m10s elapsed]
google_compute_instance.vm[2]: Still creating... [32m54s elapsed]
google_compute_instance.vm[2]: Still creating... [33m4s elapsed]
google_compute_instance.vm[2]: Still creating... [33m14s elapsed]
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts f2py, f2py3 and f2py3.6 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The script markdown_py is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts pyrsa-decrypt, pyrsa-encrypt, pyrsa-keygen, pyrsa-priv2pub, pyrsa-sign and pyrsa-verify are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The script google-oauthlib-tool is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The script tensorboard is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2]: Still creating... [33m24s elapsed]
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts estimator_ckpt_converter, saved_model_cli, tensorboard, tf_upgrade_v2, tflite_convert, toco and toco_from_protos are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts imageio_download_bin and imageio_remove_bin are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts lsm2bin and tifffile are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2]: Still creating... [33m34s elapsed]
google_compute_instance.vm[2] (remote-exec):   WARNING: The script skivi is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The script pygmentize is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts iptest, iptest3, ipython and ipython3 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts jupyter, jupyter-migrate and jupyter-troubleshoot are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The script jupyter-trust is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts jupyter-kernel, jupyter-kernelspec and jupyter-run are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The script jupyter-nbconvert is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The script jupyter-console is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2] (remote-exec):   WARNING: The scripts jupyter-bundlerextension, jupyter-nbextension, jupyter-notebook and jupyter-serverextension are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[2] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[2]: Still creating... [33m44s elapsed]
google_compute_instance.vm[2] (remote-exec): ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

google_compute_instance.vm[2] (remote-exec): We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

google_compute_instance.vm[2] (remote-exec): tensorflow-gpu 2.3.1 requires numpy<1.19.0,>=1.16.0, but you'll have numpy 1.19.4 which is incompatible.
google_compute_instance.vm[2]: Provisioning with 'remote-exec'...
google_compute_instance.vm[2] (remote-exec): Connecting to remote host via SSH...
google_compute_instance.vm[2] (remote-exec):   Host: 35.231.39.6
google_compute_instance.vm[2] (remote-exec):   User: cfurtado
google_compute_instance.vm[2] (remote-exec):   Password: false
google_compute_instance.vm[2] (remote-exec):   Private key: true
google_compute_instance.vm[2] (remote-exec):   Certificate: false
google_compute_instance.vm[2] (remote-exec):   SSH Agent: false
google_compute_instance.vm[2] (remote-exec):   Checking Host Key: false
google_compute_instance.vm[2] (remote-exec): Connected!

google_compute_instance.vm[2]: Creation complete after 33m47s [id=projects/necstlab/zones/us-east1-c/instances/cfurtado-necstlab-2]

Warning: Applied changes may be incomplete

The plan was created with the -target option in effect, so some changes
requested in the configuration may have been ignored and the output values may
not be fully updated. Run the following command to verify that no other
changes are pending:
    terraform plan

Note that the -target option is not suitable for routine use, and is provided
only for exceptional situations such [as](url) recovering from errors or mistakes, or
when Terraform specifically suggests to use it as part of an error message.

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Error when running: dlerror: libcudart.so.10.1

2020-11-02 20:06:09.697596: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot
 open shared object file: No such file or directory
2020-11-02 20:06:09.697641: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

nvidia-smi: ok

$nvidia-smi 

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   45C    P0    27W / 250W |     10MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |

checking stuff

cuda version - ok

$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

ubunto version - ok

$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic

python version

$ python3 --version
Python 3.6.9

cudnn version - ok 7.6.5:

$ cat /usr/include/x86_64-linux-gnu/cudnn_v*.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
CarolinaFurtado commented 3 years ago

standard following exactly what was done for 2.1 + reboot at the end

ubunto version -1804

 boot_disk {
    initialize_params {
      image = "projects/ubuntu-os-cloud/global/images/ubuntu-1804-bionic-v20201014"
      size = "${var.hard_drive_size_gp}"
      type = "pd-ssd"
    }
  }

cuda and cudnn versions - cuda 10.2!

sudo apt-get update

sudo apt-get install -y build-essential

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}

# install cudnn
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb

wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb

sudo reboot

creating the VM - 8min!!!!


google_compute_instance.vm[0]: Creating...
google_compute_instance.vm[0]: Still creating... [10s elapsed]
google_compute_instance.vm[0]: Still creating... [20s elapsed]
google_compute_instance.vm[0]: Still creating... [30s elapsed]
google_compute_instance.vm[0]: Provisioning with 'file'...
google_compute_instance.vm[0]: Still creating... [40s elapsed]
google_compute_instance.vm[0]: Still creating... [50s elapsed]
google_compute_instance.vm[0]: Still creating... [1m0s elapsed]
google_compute_instance.vm[0]: Still creating... [1m10s elapsed]
google_compute_instance.vm[0]: Still creating... [1m20s elapsed]
google_compute_instance.vm[0]: Provisioning with 'remote-exec'...
google_compute_instance.vm[0] (remote-exec): Connecting to remote host via SSH...
google_compute_instance.vm[0] (remote-exec):   Host: 34.74.210.177
google_compute_instance.vm[0] (remote-exec):   User: cfurtado
google_compute_instance.vm[0] (remote-exec):   Password: false
google_compute_instance.vm[0] (remote-exec):   Private key: true
google_compute_instance.vm[0] (remote-exec):   Certificate: false
google_compute_instance.vm[0] (remote-exec):   SSH Agent: false
google_compute_instance.vm[0] (remote-exec):   Checking Host Key: false
google_compute_instance.vm[0] (remote-exec): Connected!
google_compute_instance.vm[0] (remote-exec): Running resource creation script... (this may take 10+ minutes)
google_compute_instance.vm[0]: Still creating... [1m30s elapsed]
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 73%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[0]: Still creating... [1m40s elapsed]
google_compute_instance.vm[0]: Still creating... [1m50s elapsed]
google_compute_instance.vm[0] (remote-exec): --2020-11-03 20:04:10--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
google_compute_instance.vm[0] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[0] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
google_compute_instance.vm[0] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[0] (remote-exec): Length: 190 [application/octet-stream]
google_compute_instance.vm[0] (remote-exec): Saving to: ‘cuda-ubuntu1804.pin’

google_compute_instance.vm[0] (remote-exec):       cuda-   0%       0  --.-KB/s
google_compute_instance.vm[0] (remote-exec): cuda-ubuntu 100%     190  --.-KB/s    in 0s

google_compute_instance.vm[0] (remote-exec): 2020-11-03 20:04:11 (3.48 MB/s) - ‘cuda-ubuntu1804.pin’ saved [190/190]

google_compute_instance.vm[0] (remote-exec): --2020-11-03 20:04:11--  http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
google_compute_instance.vm[0] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[0] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[0] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[0] (remote-exec): Length: 1859785444 (1.7G) [application/x-deb]
google_compute_instance.vm[0] (remote-exec): Saving to: ‘cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb’

google_compute_instance.vm[0] (remote-exec):       cuda-   0%       0  --.-KB/s
google_compute_instance.vm[0] (remote-exec):      cuda-r   2%  35.78M   179MB/s
google_compute_instance.vm[0] (remote-exec):     cuda-re   5%  91.98M   230MB/s
google_compute_instance.vm[0] (remote-exec):    cuda-rep   8% 148.50M   247MB/s
google_compute_instance.vm[0] (remote-exec):   cuda-repo  11% 205.27M   257MB/s
google_compute_instance.vm[0] (remote-exec):  cuda-repo-  14% 261.36M   261MB/s
google_compute_instance.vm[0] (remote-exec): cuda-repo-u  17% 318.37M   265MB/s
google_compute_instance.vm[0] (remote-exec): uda-repo-ub  21% 374.00M   267MB/s
google_compute_instance.vm[0] (remote-exec): da-repo-ubu  24% 429.86M   269MB/s
google_compute_instance.vm[0] (remote-exec): a-repo-ubun  27% 486.12M   270MB/s
google_compute_instance.vm[0] (remote-exec): -repo-ubunt  30% 543.05M   271MB/s
google_compute_instance.vm[0] (remote-exec): repo-ubuntu  33% 599.42M   272MB/s
google_compute_instance.vm[0] (remote-exec): epo-ubuntu1  36% 656.23M   273MB/s
google_compute_instance.vm[0] (remote-exec): po-ubuntu18  40% 712.70M   274MB/s
google_compute_instance.vm[0] (remote-exec): o-ubuntu180  43% 769.66M   275MB/s
google_compute_instance.vm[0] (remote-exec): -ubuntu1804  46% 826.18M   275MB/s    eta 3s
google_compute_instance.vm[0] (remote-exec): ubuntu1804-  49% 882.84M   282MB/s    eta 3s
google_compute_instance.vm[0] (remote-exec): buntu1804-1  52% 939.10M   282MB/s    eta 3s
google_compute_instance.vm[0] (remote-exec): untu1804-10  56% 995.23M   282MB/s    eta 3s
google_compute_instance.vm[0] (remote-exec): ntu1804-10-  59%   1.02G   281MB/s    eta 3s
google_compute_instance.vm[0] (remote-exec): tu1804-10-1  62%   1.08G   280MB/s    eta 2s
google_compute_instance.vm[0] (remote-exec): u1804-10-1-  65%   1.13G   280MB/s    eta 2s
google_compute_instance.vm[0] (remote-exec): 1804-10-1-l  68%   1.19G   280MB/s    eta 2s
google_compute_instance.vm[0] (remote-exec): 804-10-1-lo  71%   1.24G   280MB/s    eta 2s
google_compute_instance.vm[0] (remote-exec): 04-10-1-loc  74%   1.30G   281MB/s    eta 2s
google_compute_instance.vm[0] (remote-exec): 4-10-1-loca  77%   1.35G   280MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): -10-1-local  80%   1.40G   278MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): 10-1-local-  83%   1.45G   278MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): 0-1-local-1  86%   1.51G   277MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): -1-local-10  90%   1.56G   277MB/s    eta 1s
google_compute_instance.vm[0] (remote-exec): 1-local-10.  93%   1.62G   277MB/s    eta 0s
google_compute_instance.vm[0]: Still creating... [2m0s elapsed]
google_compute_instance.vm[0] (remote-exec): -local-10.1  96%   1.67G   276MB/s    eta 0s
google_compute_instance.vm[0] (remote-exec): local-10.1.  99%   1.72G   276MB/s    eta 0s
google_compute_instance.vm[0] (remote-exec): cuda-repo-u 100%   1.73G   276MB/s    in 6.4s

google_compute_instance.vm[0] (remote-exec): 2020-11-03 20:04:17 (276 MB/s) - ‘cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb’ saved [1859785444/1859785444]

google_compute_instance.vm[0]: Still creating... [2m10s elapsed]
google_compute_instance.vm[0] (remote-exec): Warning: apt-key output should not be parsed (stdout is not a terminal)
google_compute_instance.vm[0]: Still creating... [2m20s elapsed]
google_compute_instance.vm[0]: Still creating... [2m30s elapsed]
google_compute_instance.vm[0]: Still creating... [2m40s elapsed]
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 11%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 22%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 33%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 45%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 56%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 67%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 79%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 90%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[0]: Still creating... [2m50s elapsed]
google_compute_instance.vm[0]: Still creating... [3m0s elapsed]
google_compute_instance.vm[0]: Still creating... [3m10s elapsed]
google_compute_instance.vm[0]: Still creating... [3m20s elapsed]
google_compute_instance.vm[0]: Still creating... [3m30s elapsed]
google_compute_instance.vm[0]: Still creating... [3m40s elapsed]
google_compute_instance.vm[0]: Still creating... [3m50s elapsed]
google_compute_instance.vm[0]: Still creating... [4m0s elapsed]
google_compute_instance.vm[0]: Still creating... [4m10s elapsed]
google_compute_instance.vm[0]: Still creating... [4m20s elapsed]
google_compute_instance.vm[0]: Still creating... [4m30s elapsed]
google_compute_instance.vm[0]: Still creating... [4m40s elapsed]
google_compute_instance.vm[0]: Still creating... [4m50s elapsed]
google_compute_instance.vm[0]: Still creating... [5m0s elapsed]
google_compute_instance.vm[0]: Still creating... [5m10s elapsed]
google_compute_instance.vm[0]: Still creating... [5m20s elapsed]
google_compute_instance.vm[0]: Still creating... [5m30s elapsed]
google_compute_instance.vm[0]: Still creating... [5m40s elapsed]
google_compute_instance.vm[0]: Still creating... [5m50s elapsed]
google_compute_instance.vm[0]: Still creating... [6m0s elapsed]
google_compute_instance.vm[0] (remote-exec): --2020-11-03 20:08:26--  http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
google_compute_instance.vm[0] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[0] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[0] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[0] (remote-exec): Length: 182313188 (174M) [application/x-deb]
google_compute_instance.vm[0] (remote-exec): Saving to: ‘libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb’

google_compute_instance.vm[0] (remote-exec):       libcu   0%       0  --.-KB/s
google_compute_instance.vm[0] (remote-exec):      libcud  21%  37.97M   190MB/s
google_compute_instance.vm[0] (remote-exec):     libcudn  54%  94.67M   237MB/s
google_compute_instance.vm[0]: Still creating... [6m10s elapsed]
google_compute_instance.vm[0] (remote-exec):    libcudnn  86% 150.36M   250MB/s
google_compute_instance.vm[0] (remote-exec): libcudnn7_7 100% 173.87M   255MB/s    in 0.7s

google_compute_instance.vm[0] (remote-exec): 2020-11-03 20:08:27 (255 MB/s) - ‘libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb’ saved [182313188/182313188]

google_compute_instance.vm[0]: Still creating... [6m20s elapsed]
google_compute_instance.vm[0] (remote-exec): --2020-11-03 20:08:45--  http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
google_compute_instance.vm[0] (remote-exec): Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
google_compute_instance.vm[0] (remote-exec): Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:80... connected.
google_compute_instance.vm[0] (remote-exec): HTTP request sent, awaiting response... 200 OK
google_compute_instance.vm[0] (remote-exec): Length: 160506208 (153M) [application/x-deb]
google_compute_instance.vm[0] (remote-exec): Saving to: ‘libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb’

google_compute_instance.vm[0] (remote-exec):       libcu   0%       0  --.-KB/s
google_compute_instance.vm[0] (remote-exec):      libcud  23%  35.27M   176MB/s
google_compute_instance.vm[0] (remote-exec):     libcudn  59%  91.42M   229MB/s
google_compute_instance.vm[0] (remote-exec):    libcudnn  96% 147.07M   245MB/s
google_compute_instance.vm[0] (remote-exec): libcudnn7-d 100% 153.07M   246MB/s    in 0.6s

google_compute_instance.vm[0] (remote-exec): 2020-11-03 20:08:45 (246 MB/s) - ‘libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb’ saved [160506208/160506208]

google_compute_instance.vm[0]: Still creating... [6m30s elapsed]
google_compute_instance.vm[0]: Still creating... [6m40s elapsed]
google_compute_instance.vm[0]: Still creating... [6m50s elapsed]
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 13%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 26%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 40%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 53%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 67%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 80%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 94%
google_compute_instance.vm[0] (remote-exec): Extracting templates from packages: 100%
google_compute_instance.vm[0]: Still creating... [7m0s elapsed]
google_compute_instance.vm[0]: Still creating... [7m10s elapsed]
google_compute_instance.vm[0]: Still creating... [7m20s elapsed]
google_compute_instance.vm[0]: Still creating... [7m30s elapsed]
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts easy_install and easy_install-3.6 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0] (remote-exec): WARNING: Skipping crcmod as it is not installed.
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0]: Still creating... [7m40s elapsed]
google_compute_instance.vm[0] (remote-exec): WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
google_compute_instance.vm[0] (remote-exec): Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
google_compute_instance.vm[0] (remote-exec): To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
google_compute_instance.vm[0]: Still creating... [7m50s elapsed]
google_compute_instance.vm[0]: Still creating... [8m0s elapsed]
google_compute_instance.vm[0]: Still creating... [8m10s elapsed]
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts f2py, f2py3 and f2py3.6 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script markdown_py is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts pyrsa-decrypt, pyrsa-encrypt, pyrsa-keygen, pyrsa-priv2pub, pyrsa-sign and pyrsa-verify are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script google-oauthlib-tool is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script tensorboard is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0]: Still creating... [8m20s elapsed]
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts estimator_ckpt_converter, saved_model_cli, tensorboard, tf_upgrade_v2, tflite_convert, toco and toco_from_protos are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts lsm2bin and tifffile are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0]: Still creating... [8m30s elapsed]
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts imageio_download_bin and imageio_remove_bin are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script skivi is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script pygmentize is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts iptest, iptest3, ipython and ipython3 are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts jupyter, jupyter-migrate and jupyter-troubleshoot are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts jupyter-kernel, jupyter-kernelspec and jupyter-run are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script jupyter-trust is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0]: Still creating... [8m40s elapsed]
google_compute_instance.vm[0] (remote-exec):   WARNING: The script jupyter-nbconvert is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The scripts jupyter-bundlerextension, jupyter-nbextension, jupyter-notebook and jupyter-serverextension are installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec):   WARNING: The script jupyter-console is installed in '/home/cfurtado/.local/bin' which is not on PATH.
google_compute_instance.vm[0] (remote-exec):   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
google_compute_instance.vm[0] (remote-exec): ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

google_compute_instance.vm[0] (remote-exec): We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

google_compute_instance.vm[0] (remote-exec): tensorflow-gpu 2.3.1 requires numpy<1.19.0,>=1.16.0, but you'll have numpy 1.19.4 which is incompatible.
google_compute_instance.vm[0]: Provisioning with 'remote-exec'...
google_compute_instance.vm[0] (remote-exec): Connecting to remote host via SSH...
google_compute_instance.vm[0] (remote-exec):   Host: 34.74.210.177
google_compute_instance.vm[0] (remote-exec):   User: cfurtado
google_compute_instance.vm[0] (remote-exec):   Password: false
google_compute_instance.vm[0] (remote-exec):   Private key: true
google_compute_instance.vm[0] (remote-exec):   Certificate: false
google_compute_instance.vm[0] (remote-exec):   SSH Agent: false
google_compute_instance.vm[0] (remote-exec):   Checking Host Key: false
google_compute_instance.vm[0] (remote-exec): Connected!

google_compute_instance.vm[0]: Creation complete after 8m43s [id=projects/necstlab/zones/us-east1-c/instances/cfurtado-necstlab-0]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Error when running: dlerror: libcudart.so.10.1

2020-11-02 20:06:09.697596: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot
 open shared object file: No such file or directory
2020-11-02 20:06:09.697641: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

nvidia-smi: ok

$nvidia-smi 
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

checking stuff

cuda version - ok

$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

ubunto version - ok

$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic

python version

$ python3 --version
Python 3.6.9

cudnn version - ok 7.6.5:

$ cat /usr/include/x86_64-linux-gnu/cudnn_v*.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"