tensorflow / fold

Deep learning with dynamic computation graphs in TensorFlow
Apache License 2.0
1.83k stars 266 forks source link

segmentation fault(core dumped) when eval() #30

Open chuneli opened 7 years ago

chuneli commented 7 years ago

Hi,

I installed tensorflow (python 2.7 GPU support) on Ubuntu 16.04 and the examples in the tensorflow tutorial seem to work well. I then install tensorflow_fold, it can be loaded successfully. But when the code goes to scalar_block.eval(42) in the quick.ipynb, i get the message segmentation fault(core dumped).

Hope for help! Thanks in advance.

`abc@518G1N:~$ su Password: root@518G1N:/home/abc# virtualenv TFenv Running virtualenv with interpreter /usr/bin/python2 New python executable in /home/abc/TFenv/bin/python2 Also creating executable in /home/abc/TFenv/bin/python Installing setuptools, pkg_resources, pip, wheel...done. root@518G1N:/home/abc# source TFenv/bin/activate (TFenv) root@518G1N:/home/abc# pip install --upgrade tensorflow-gpu Collecting tensorflow-gpu Using cached tensorflow_gpu-1.0.1-cp27-cp27mu-manylinux1_x86_64.whl Collecting mock>=2.0.0 (from tensorflow-gpu) Using cached mock-2.0.0-py2.py3-none-any.whl Collecting numpy>=1.11.0 (from tensorflow-gpu) Using cached numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl Collecting protobuf>=3.1.0 (from tensorflow-gpu) Using cached protobuf-3.2.0-cp27-cp27mu-manylinux1_x86_64.whl Requirement already up-to-date: wheel in ./TFenv/lib/python2.7/site-packages (from tensorflow-gpu) Requirement already up-to-date: six>=1.10.0 in ./TFenv/lib/python2.7/site-packages (from tensorflow-gpu) Collecting funcsigs>=1; python_version < "3.3" (from mock>=2.0.0->tensorflow-gpu) Using cached funcsigs-1.0.2-py2.py3-none-any.whl Collecting pbr>=0.11 (from mock>=2.0.0->tensorflow-gpu) Using cached pbr-2.0.0-py2.py3-none-any.whl Requirement already up-to-date: setuptools in ./TFenv/lib/python2.7/site-packages (from protobuf>=3.1.0->tensorflow-gpu) Requirement already up-to-date: appdirs>=1.4.0 in ./TFenv/lib/python2.7/site-packages (from setuptools->protobuf>=3.1.0->tensorflow-gpu) Requirement already up-to-date: packaging>=16.8 in ./TFenv/lib/python2.7/site-packages (from setuptools->protobuf>=3.1.0->tensorflow-gpu) Requirement already up-to-date: pyparsing in ./TFenv/lib/python2.7/site-packages (from packaging>=16.8->setuptools->protobuf>=3.1.0->tensorflow-gpu) Installing collected packages: funcsigs, pbr, mock, numpy, protobuf, tensorflow-gpu Successfully installed funcsigs-1.0.2 mock-2.0.0 numpy-1.12.0 pbr-2.0.0 protobuf-3.2.0 tensorflow-gpu-1.0.1 (TFenv) root@518G1N:/home/abc# python /home/abc/TensorFlow1.0/fully_connected_feed.py I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes. Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes. Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes. Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes. Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate (GHz) 1.8475 pciBusID 0000:03:00.0 Total memory: 7.92GiB Free memory: 7.81GiB W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x438d980 I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate (GHz) 1.8475 pciBusID 0000:01:00.0 Total memory: 7.92GiB Free memory: 7.81GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0) I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:01:00.0) Step 0: loss = 2.34 (1.024 sec) Step 100: loss = 2.13 (0.002 sec) Step 200: loss = 1.93 (0.002 sec) Step 300: loss = 1.55 (0.001 sec) Step 400: loss = 1.35 (0.002 sec) Step 500: loss = 0.92 (0.002 sec) Step 600: loss = 0.73 (0.001 sec) Step 700: loss = 0.71 (0.001 sec) Step 800: loss = 0.68 (0.002 sec) Step 900: loss = 0.80 (0.001 sec) Training Data Eval: Num examples: 55000 Num correct: 47536 Precision @ 1: 0.8643 Validation Data Eval: Num examples: 5000 Num correct: 4345 Precision @ 1: 0.8690 Test Data Eval: Num examples: 10000 Num correct: 8718 Precision @ 1: 0.8718 Step 1000: loss = 0.46 (0.002 sec) Step 1100: loss = 0.56 (0.057 sec) Step 1200: loss = 0.50 (0.002 sec) Step 1300: loss = 0.41 (0.001 sec) Step 1400: loss = 0.38 (0.001 sec) Step 1500: loss = 0.41 (0.002 sec) Step 1600: loss = 0.27 (0.002 sec) Step 1700: loss = 0.38 (0.002 sec) Step 1800: loss = 0.26 (0.001 sec) Step 1900: loss = 0.39 (0.001 sec) Training Data Eval: Num examples: 55000 Num correct: 49240 Precision @ 1: 0.8953 Validation Data Eval: Num examples: 5000 Num correct: 4516 Precision @ 1: 0.9032 Test Data Eval: Num examples: 10000 Num correct: 9009 Precision @ 1: 0.9009 (TFenv) root@518G1N:/home/abc# pip install https://storage.googleapis.com/tensorflow_fold/tensorflow_fold-0.0.1-cp27-none-linux_x86_64.whl Collecting tensorflow-fold==0.0.1 from https://storage.googleapis.com/tensorflow_fold/tensorflow_fold-0.0.1-cp27-none-linux_x86_64.whl Using cached https://storage.googleapis.com/tensorflow_fold/tensorflow_fold-0.0.1-cp27-none-linux_x86_64.whl Requirement already satisfied: mock>=2.0.0 in ./TFenv/lib/python2.7/site-packages (from tensorflow-fold==0.0.1) Requirement already satisfied: numpy>=1.11.0 in ./TFenv/lib/python2.7/site-packages (from tensorflow-fold==0.0.1) Collecting nltk>=3.0.0 (from tensorflow-fold==0.0.1) Requirement already satisfied: protobuf>=3.1.0 in ./TFenv/lib/python2.7/site-packages (from tensorflow-fold==0.0.1) Requirement already satisfied: wheel in ./TFenv/lib/python2.7/site-packages (from tensorflow-fold==0.0.1) Requirement already satisfied: six>=1.10.0 in ./TFenv/lib/python2.7/site-packages (from tensorflow-fold==0.0.1) Requirement already satisfied: funcsigs>=1; python_version < "3.3" in ./TFenv/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-fold==0.0.1) Requirement already satisfied: pbr>=0.11 in ./TFenv/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-fold==0.0.1) Requirement already satisfied: setuptools in ./TFenv/lib/python2.7/site-packages (from protobuf>=3.1.0->tensorflow-fold==0.0.1) Requirement already satisfied: appdirs>=1.4.0 in ./TFenv/lib/python2.7/site-packages (from setuptools->protobuf>=3.1.0->tensorflow-fold==0.0.1) Requirement already satisfied: packaging>=16.8 in ./TFenv/lib/python2.7/site-packages (from setuptools->protobuf>=3.1.0->tensorflow-fold==0.0.1) Requirement already satisfied: pyparsing in ./TFenv/lib/python2.7/site-packages (from packaging>=16.8->setuptools->protobuf>=3.1.0->tensorflow-fold==0.0.1) Installing collected packages: nltk, tensorflow-fold Successfully installed nltk-3.2.2 tensorflow-fold-0.0.1 (TFenv) root@518G1N:/home/abc# python Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally sess = tf.InteractiveSession() W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate (GHz) 1.8475 pciBusID 0000:03:00.0 Total memory: 7.92GiB Free memory: 7.81GiB W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2176c40 I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate (GHz) 1.8475 pciBusID 0000:01:00.0 Total memory: 7.92GiB Free memory: 7.81GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0) I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:01:00.0) import tensorflow_fold as td I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0) I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:01:00.0) scalar_block = td.Scalar() vector3_block = td.Vector(3) def block_info(block): ... print("%s: %s -> %s" % (block, block.input_type, block.output_type)) ...
... block_info(scalar_block)

: PyObjectType() -> TensorType((), 'float32') block_info(vector3_block) : PyObjectType() -> TensorType((3,), 'float32') print scalar_block.eval(42) **Segmentation fault (core dumped)** (TFenv) root@518G1N:/home/abc# (TFenv) root@518G1N:/home/abc# (TFenv) root@518G1N:/home/abc# cat /proc/version Linux version 4.4.0-66-generic (buildd@lgw01-28) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 (TFenv) root@518G1N:/home/abc# nvidia-smi Fri Mar 10 11:50:33 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 0000:01:00.0 Off | N/A | | 65% 49C P0 43W / 200W | 0MiB / 8113MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 0000:03:00.0 Off | N/A | | 0% 47C P0 39W / 200W | 0MiB / 8113MiB | 2% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ (TFenv) root@518G1N:/home/abc# python Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally print tf.version 1.0.1 print tf.path ['/home/abc/TFenv/local/lib/python2.7/site-packages/tensorflow']

`

silky commented 7 years ago

yes, i have the same problem with python 3.5

silky commented 7 years ago

even when building from source.

moshelooks commented 7 years ago

chuneli: sorry to hear, my suspicion is that TF team broke ABI compatibility - would you mind trying with an rc0 wheel and see if that works? For gpu and python2.7 you can get this with

pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.0rc0-cp27-none-linux_x86_64.whl (make sure you uninstall tensorflow and tensorflow-gpu first or use a different virtualenv)

Noon: what platform are you on? Please note that fold is currently only known to work on Linux, not Windows or OSX.

On Thu, Mar 9, 2017 at 10:00 PM, Noon van der Silk <notifications@github.com

wrote:

even when building from source.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/fold/issues/30#issuecomment-285585640, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsbjloDAqfPCM9EOHeicePQjFiZUuD8ks5rkOcIgaJpZM4MY8kg .

silky commented 7 years ago

i'm on linux ; i'll try with tf-1.0.0 instead of 1.0.1

silky commented 7 years ago

i got it to run on linux with python 2.7 and tensorflow 1.0.0rc0

chuneli commented 7 years ago

moshelooks: Yes, with TF 1.0.0rc0, tensorflow-fold works now. Thank you!

coopie commented 7 years ago

I have it working with python 3.6 and tensorflow 1.0.0 (ubuntu 16.04) 1.0.1 caused seg-faults in eval for me too

WenchenLi commented 7 years ago

I have it working with python 2.7 and tensorflow 1.0.1 (ubuntu 16.04), it caused segfault too, on eval()

liruoteng commented 7 years ago

I also have the same problem with Ubuntu 14.04 tensor flow 1.0.1 python 2.7

Jongchan commented 7 years ago

I faced the same problem when using tensorflow 1.0.0. But after using 1.0.0-rc0-gpu installed with Docker, the segfault is gone. (now I have other issues, but at lease segfault is gone) I am using Python 2.7, Ubuntu 14.04, TF 1.0.0-rc0

amanjhunjhunwala commented 7 years ago

Same on Ubuntu 16 LTS, Tensorflow-GPU 1.2.1 !

denismakogon commented 7 years ago

Indeed, just spend hours trying to build appropriate image to get rid of segmentation fault and found this thread. I tried next combination:

scipy==0.19.1
h5py==2.7.1
numpy==1.13.1
tensorflow==1.0.0
tflearn==0.3.2

This works for sure, previous attempts to install all latest dependencies (especially tensorflow 1.3.0):

bleach==1.5.0
emorecognition==0.0.2
h5py==2.7.1
html5lib==0.9999999
Markdown==2.6.9
nose==1.3.7
numpy==1.13.1
olefile==0.44
pandas==0.20.3
Pillow==4.2.1
protobuf==3.4.0
python-dateutil==2.6.1
pytz==2017.2
scipy==0.19.1
six==1.10.0
tensorflow==1.3.0
tensorflow-tensorboard==0.1.6
tflearn==0.3.2
Werkzeug==0.12.2

Was causing a segfault 11. Now i'm able to build fine grained docker image using jjanzic/docker-python3-opencv as bases.

Unfortunately i have no clue what was going on before segfault killed docker container =/

denismakogon commented 7 years ago

This issue still valid for tflearn https://github.com/tflearn/tflearn/issues/905