Closed benoistlaurent closed 3 years ago
Did you solve this?
I have the same problem (also using fit_generator) but during the epoch, consistently within the first or first 5 epochs. Turns out versions before 2.0.9 are fine, only 2.0.9 shows this behaviour. Running on Tensorflow 1.0.1.
Training model
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
Epoch 1/20
7882/7884 [============================>.] - ETA: 0s - loss: 0.1092 - categorical_accuracy: 0.9744Epoch 00001: val_categorical_accuracy improved from -inf to 0.99166, saving model to /home/xxx/code7884/7884 [==============================] - 410s 52ms/step - loss: 0.1092 - categorical_accuracy: 0.9744 - val_loss: 0.0319 - val_categorical_accuracy: 0.9917
Epoch 2/20
7882/7884 [============================>.] - ETA: 0s - loss: 0.0438 - categorical_accuracy: 0.9893Epoch 00002: val_categorical_accuracy improved from 0.99166 to 0.99559, saving model to /home/xxx/c7884/7884 [==============================] - 410s 52ms/step - loss: 0.0438 - categorical_accuracy: 0.9893 - val_loss: 0.0151 - val_categorical_accuracy: 0.9956
Epoch 3/20
5925/7884 [=====================>........] - ETA: 1:38 - loss: 0.0342 - categorical_accuracy: 0.9917Segmentation fault (core dumped)
I still got the problem on this version of the script (using tensorflow==1.3.0 and Keras==2.0.8).
The solution I ended up with is to stop using model.fit_generator
and replace it with model.train_on_batch
(see this example)
I just had the exact same problem. I use fit_generator and the generator is reading data from a file.
The first epoch ends with a segmentation fault : Epoch 1/35 2694/2695 [============================>.] - ETA: 0s - loss: 0.2397 - acc: 0.9052 - jaccard_coef: 0.4410 - jaccard_coef_int: 0.5470Erreur de segmentation (core dumped)
It doesn't happen when I remove the validation_data.
With keras 1.2.2 and tensorflow 1.4.0
Same problem keras: 2.0.8 tensorflow: 1.3.0 Training data: 1048 Test data: 259 image type: 120x120x1
Same problem. @fchollet is there any way this could be fixed? I'm happy to help provide debug info and potentially contribute as needed.
Using fit_generator, with generator arguments for both training and validation data. Both train and validation generators read from the same HDF5 file (using pandas).
I've tried it with my full dataset (237K rows) and a sample subset of the full dataset (1000 rows), both with ~1K columns, and in both cases the segmentation fault happens right after the first epoch finishes. Like others, if I remove the validation data it doesn't occur. I'm using a train/test split of 85/15 and a batch size of 64 for both the full and sample datasets (so I'm only reading 64 rows from the HDF5 file at any given time, in the generator). Output from top
confirms that I'm not running out of memory.
Versions: Keras 2.1.6 tensorflow 1.8.0
Unlike @dgorissen, I'm experiencing this issue on 2.0.8 as well as 2.1.6.
I believe I actually just figured out what was causing my personal issue. Not sure if this will apply to others, but in my generator I was using pd.read_hdf
to read subsets of an HDF5 file into memory, but the problem is that read_hdf
is not thread-safe, even for reading (documentation is currently not clear about this).
I solved this problem by passing workers=0
to fit_generator
, so that the generator is executed on the main thread.
I had the same problem with the following code
def init_model():
model = Sequential()
model.add(Conv3D(4, kernel_size=(3, 3, 3), input_shape=(None,None,None,1), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling3D(pool_size=(3, 3, 3), padding='same'))
model.add(Dropout(0.25))
model.add(GlobalAveragePooling3D())
model.add(Dense(32, activation='sigmoid'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='softmax'))
model.compile(loss='mse', optimizer=Adam())
print(model.summary())
return model
def input_generator(metas):
while True:
meta_sample = metas.sample(frac=1)
yield np.expand_dims(np.expand_dims(load_from_meta(meta_sample), axis=4), axis=0), [meta_sample.DMOS]
video_metadatas = get_datas().iloc[0]
model = init_model()
hist = model.fit_generator(generator=input_generator(video_metadatas), epochs=1, steps_per_epoch=1, use_multiprocessing=False, workers=0)
load_from_meta()
loads videos using a ffmpeg wrapper
I fixed the issue with workers=0
Actually it does not work every time
$ for k in 1 2 3 4 5 6 7 8 9 10; do python3 3D_CNN.py; done
Using TensorFlow backend.
Segmentation fault (core dumped)
Using TensorFlow backend.
Segmentation fault (core dumped)
Using TensorFlow backend.
Using TensorFlow backend.
Segmentation fault (core dumped)
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Segmentation fault (core dumped)
Using TensorFlow backend.
Using TensorFlow backend.
Segmentation fault (core dumped)
Using TensorFlow backend.
Segmentation fault (core dumped)
I experience the same problem (segmentation fault during the first epochs when using fit_generator).
The segmentation fault occurs when I run fit_generator on CPUs with a batch size of 40 It does not occur when I run the same example (see below) on a GPU (GTX 1080 Ti) or when running on CPU with a batch size of 10. I was able to reproduce the segmentation faults on two linux machines.
4/10 [===========>..................] - ETA: 10:59 - loss: 0.4438Segmentation fault (core dumped)
Here is a small standalone script that produces the segmentation fault (when using batch_size = 40 and run on CPUs): https://gist.github.com/gschramm/e6db1f7333b50bca10c38243efec0925
Any idea what is going wrong?
I am running:
Hi, I just met this symptom in my Docker environment with Keras 2.2.4 and Tensorflow 1.12 (GPU).
For me, the issue disappeared when I changed Tensorflow to 1.13-gpu-py3
I'm not sure about it is solved completely, but writing my environment for future visitors...
Linux 4.15.0-51-generic
18.09.6
1.39
nvidia-docker2
)9.2
Mon Jun 17 17:23:44 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 40% 39C P8 21W / 250W | 403MiB / 10989MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 37% 34C P8 2W / 250W | 1MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1114 G /usr/lib/xorg/Xorg 224MiB |
| 0 2511 G compiz 79MiB |
| 0 10650 G ...-token=9E760AB97E59CC5C02D0AFC5D37FE54E 98MiB |
+-----------------------------------------------------------------------------+
1.12-gpu-py3
>> 1.13-gpu-py3
(solved)FROM tensorflow/tensorflow:1.13.1-gpu-py3 as ship
LABEL maintainer="luncliff@gmail.com"
RUN pip install -qqq --upgrade pip && pip install -qqq keras
RUN pip install -qqq pillow
# ...
For the larger datasets with Keras multithreading, users needs to adopt a threadsafe generator method to deal with the issue. There is a brief introdcution by Anand Chitipothu as well as the explaination of composed functions by Mathieu Larose. The threadsafe method has been adopted in the library of Faster RCNN by RGB and Kaiming He.
threadsafe_code: http://anandology.com/blog/using-iterators-and-generators/ composition of functions: https://mathieularose.com/function-composition-in-python/
Hi,
I use
model.fit_generator
to handle a large dataset.I want to read data by batch from a source file, which I did successfully using a CSV file.
When I want to use
pandas.read_hdf
function, kerasfit_generator
ends-up with a segmentation fault:I already noticed that if I do not use
validation_data
, I don't get the segmentation fault but I don't understand why.Here is a link to the small example I'm running: wine-example
Any help would be very much appreciated.
Cheers, Ben