Open suprateembanerjee opened 3 years ago
Hi, thank you @suprateem48 for bringing this up. I'm actually facing the same issue. Please find my stack trace for reference below. Advice from anybody how too solve that issue is highly appreciated - thank you in advance!
existing dataset files found -> loading.... Loading labels: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16551/16551 [00:03<00:00, 4725.84it/s] Loading image IDs: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16551/16551 [00:01<00:00, 9289.97it/s] Loading evaluation-neutrality annotations: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16551/16551 [00:02<00:00, 7384.34it/s] Loading labels: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:01<00:00, 4734.14it/s] Loading image IDs: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:00<00:00, 9295.89it/s] Loading evaluation-neutrality annotations: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:00<00:00, 7491.51it/s] Number of images in the training dataset: 16551 Number of images in the validation dataset: 4952 2021-05-03 07:25:05.965847: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) Epoch 1/120
Epoch 00001: LearningRateScheduler reducing learning rate to 0.001.
Traceback (most recent call last):
File "D:\projects\python\ssd_test\ssd_test.py", line 288, in
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py:805 train_function *
return step_function(self, iterator)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py:795 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1259 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2730 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3417 _call_for_each_replica
return fn(*args, **kwargs)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py:788 run_step **
outputs = model.train_step(data)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py:754 train_step
y_pred = self(x, training=True)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:998 __call__
input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\input_spec.py:207 assert_input_compatibility
' input tensors. Inputs received: ' + str(inputs))
ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>]
Hello. I had the same problem and I racked my brain these days.
Finally, I use
validation_data=tuple(val_generator),
instead of
validation_data=val_generator,
The error has been solved.
But I run out my memory (Google Colab Free Version) and looking for another environment.
By the way, in my case,
the command
history = model.fit_generator(generator=train_generator,
doesn't work anymore, I have to use
history = model.fit(train_generator,
@bfhaha I tried your fix, but it did not solve the issue for me. Running the code `initial_epoch = 0 final_epoch = 120 steps_per_epoch = 1000
history = model.fit(x=train_generator, steps_per_epoch=steps_per_epoch, epochs=final_epoch, callbacks=callbacks, validation_data=tuple(val_generator), validation_steps=ceil(val_dataset_size/batch_size), initial_epoch=initial_epoch)`
still results in
ValueError Traceback (most recent call last)
@suprateem48 Sorry. I really don't know where is the problem.
If I were you, I would try the commands
print(val_generator) # It is supposed to be <generator object DataGenerator.generate at 0x7f33b9691a50>
and
print(tuple(val_generator)) # It is supposed to be ()
after defining val_generator to observe the difference.
@bfhaha Weird thing, I managed to reproduce your memory issue even on my 8GB RTX 2070 Super, but this error is given only for the first time the kernel runs model.fit(). Every consecutive time model.fit() is rerun on the same kernel, it throws the old tuple-related error.
@bfhaha thanks for your fix. I've tried it as well. Same here: memory error (32GB RAM Predator, GForce 1070). I gave it a second try with a reduced data set of just 8 images, but same result. I know it doesn't really help, but I just wanted to share the information...
Hello. I have tried to run the notebook on a Google Compute Engine (E2 series, e2-highmem-16, 16vCPU, 128 GB memory) 80 GB disk. It also crashed... I was running ssd7_training.ipynb, not ssd300.
I had rented a Google Compute Engine (N2 series, custom 8 vCPU, 640 GB memory, 200 GB Disk) yesterday and showed the following message after running one hour.
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-24-fb2ae06e0e3c> in <module>
9 epochs=final_epoch,
10 callbacks=callbacks,
---> 11 validation_data=tuple(val_generator),
12 validation_steps=ceil(val_dataset_size/batch_size),
13 initial_epoch=initial_epoch)
~/data_generator/object_detection_2d_data_generator.py in generate(self, batch_size, shuffle, transformations, label_encoder, returns, keep_images_without_gt, degenerate_box_handling)
1149 batch_y_encoded, batch_matched_anchors = label_encoder(batch_y, diagnostics=True)
1150 else:
-> 1151 batch_y_encoded = label_encoder(batch_y, diagnostics=False)
1152 batch_matched_anchors = None
1153
~/ssd_encoder_decoder/ssd_input_encoder.py in __call__(self, ground_truth_labels, diagnostics)
311 ##################################################################################
312
--> 313 y_encoded = self.generate_encoding_template(batch_size=batch_size, diagnostics=False)
314
315 ##################################################################################
~/ssd_encoder_decoder/ssd_input_encoder.py in generate_encoding_template(self, batch_size, diagnostics)
604 # shape as the SSD model output tensor. The content of this tensor is irrelevant, we'll just use
605 # `boxes_tensor` a second time.
--> 606 y_encoding_template = np.concatenate((classes_tensor, boxes_tensor, boxes_tensor, variances_tensor), axis=2)
607
608 if diagnostics:
<__array_function__ internals> in concatenate(*args, **kwargs)
MemoryError: Unable to allocate 25.7 MiB for an array with shape (16, 11692, 18) and data type float64
@bfhaha Yes,this is the exact Memory related issue I faced as well.
@suprateem48 Have you ever tried this solution? https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type I don't have enough money to rent the VM instance to test again.
Same issue right here, will update if I can find something.
Edit: It seems like calling next(val_generator) is infinite ? Not quite sure why. But calling tuple() on an infinite generator will cause a memory error.
By the way, in my case, the command
history = model.fit_generator(generator=train_generator,
doesn't work anymore, I have to usehistory = model.fit(train_generator,
@bfhaha could you show me how you used model.fit instead of model.fit_generator ? :)
@JuliusJacobitz Sorry. I don't understand what you mean "how" I used model.fit.
It showed the following message when I was using
history = model.fit_generator(generator=train_generator
UserWarning: Model.fit_generator
is deprecated and will be removed in a future version. Please use Model.fit
, which supports generators.
So I just changed model.fit_generator
to model.fit
The example at here shows that we don't have to use generator= if we use model.fit
Hi, I had struggled same issue.
The reason was that the return of the data_generator was [batch_X, batch_y_encoded]. I changed the DataGenerator class, generate() in object_detection_2d_data_generator.py to the following. Instead of returning a list, it returns two returns, batch_X and batch_y_encoded. It's a primitive solution, but it works fine.
#########################################################################################
# Compose the output.
#########################################################################################
ret = []
if 'processed_images' in returns: ret.append(batch_X)
if 'encoded_labels' in returns: ret.append(batch_y_encoded)
if 'matched_anchors' in returns: ret.append(batch_matched_anchors)
if 'processed_labels' in returns: ret.append(batch_y)
if 'filenames' in returns: ret.append(batch_filenames)
if 'image_ids' in returns: ret.append(batch_image_ids)
if 'evaluation-neutral' in returns: ret.append(batch_eval_neutral)
if 'inverse_transform' in returns: ret.append(batch_inverse_transforms)
if 'original_images' in returns: ret.append(batch_original_images)
if 'original_labels' in returns: ret.append(batch_original_labels)
yield batch_X, batch_y_encoded # do not yield ret
FYI: Here is my model.fit.
history = model.fit(train_generator,
steps_per_epoch=ceil(train_dataset_size/batch_size),
epochs=final_epoch,
callbacks=callbacks,
validation_data=val_generator,
validation_steps=ceil(val_dataset_size/batch_size),
initial_epoch=initial_epoch,
verbose=1)
@pirolone888 Thanks. But it showed UnboundLocalError: local variable 'batch_X' referenced before assignment when I was trying your method.
Hello, I also converted the code from tensorflow 1.x to tensorflow 2.4. I fixed the problem you are having by changing the DataGenerator in object_detection_2d_data_generator.py as such:
ret = [] if 'processed_images' in returns: ret.append(batch_X) if 'encoded_labels' in returns: ret.append(batch_y_encoded) if 'matched_anchors' in returns: ret.append(batch_matched_anchors) if 'processed_labels' in returns: ret.append(batch_y) if 'filenames' in returns: ret.append(batch_filenames) if 'image_ids' in returns: ret.append(batch_image_ids) if 'evaluation-neutral' in returns: ret.append(batch_eval_neutral) if 'inverse_transform' in returns: ret.append(batch_inverse_transforms) if 'original_images' in returns: ret.append(batch_original_images) if 'original_labels' in returns: ret.append(batch_original_labels)
yield tuple(ret)
I simply changed yield ret to yield tuple(ret).
@daviddanialy Thanks. So just place the following code under the function generate
in object_detection_2d_data_generator.py
? (It has been indented by eight spaces.)
ret = []
if 'processed_images' in returns: ret.append(batch_X)
if 'encoded_labels' in returns: ret.append(batch_y_encoded)
if 'matched_anchors' in returns: ret.append(batch_matched_anchors)
if 'processed_labels' in returns: ret.append(batch_y)
if 'filenames' in returns: ret.append(batch_filenames)
if 'image_ids' in returns: ret.append(batch_image_ids)
if 'evaluation-neutral' in returns: ret.append(batch_eval_neutral)
if 'inverse_transform' in returns: ret.append(batch_inverse_transforms)
if 'original_images' in returns: ret.append(batch_original_images)
if 'original_labels' in returns: ret.append(batch_original_labels)
yield tuple(ret)
It still showed the original error message (Layer model expects 1 input(s), but it received 2 input tensors...).
I have already given up trying this project and trying matterport's mask rcnn for object detection.
That code is already in the generate function, you just change yield ret
to yield tuple(ret)
. I may have to switch to a different repo as well, because I'm having issues with the predictions not ever exceeding the confidence threshold.
@daviddanialy Thanks. It doesn't work for me.
Any solutions? Here is my code:
# TODO: Set the epochs to train for.
# If you're resuming a previous training, set `initial_epoch` and `final_epoch` accordingly.
initial_epoch = 0
final_epoch = 20
steps_per_epoch = 1000
history = model.fit(train_generator,
steps_per_epoch=steps_per_epoch,
epochs=final_epoch,
callbacks=callbacks,
validation_data=val_generator,
validation_steps=ceil(val_dataset_size/batch_size),
initial_epoch=initial_epoch)
ret = []
if 'processed_images' in returns: ret.append(batch_X)
if 'encoded_labels' in returns: ret.append(batch_y_encoded)
if 'matched_anchors' in returns: ret.append(batch_matched_anchors)
if 'processed_labels' in returns: ret.append(batch_y)
if 'filenames' in returns: ret.append(batch_filenames)
if 'image_ids' in returns: ret.append(batch_image_ids)
if 'evaluation-neutral' in returns: ret.append(batch_eval_neutral)
if 'inverse_transform' in returns: ret.append(batch_inverse_transforms)
if 'original_images' in returns: ret.append(batch_original_images)
if 'original_labels' in returns: ret.append(batch_original_labels)
yield ret
I have tried ret
, tuple(ret)
, [ret]
, and I still get the following:
ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>]
Hello. I had the same problem and I racked my brain these days. Finally, I use
validation_data=tuple(val_generator),
instead ofvalidation_data=val_generator,
The error has been solved. But I run out my memory (Google Colab Free Version) and looking for another environment.By the way, in my case, the command
history = model.fit_generator(generator=train_generator,
doesn't work anymore, I have to usehistory = model.fit(train_generator,
Tried that. Not seem to work for me...
I solved the same issue for model.predict()
for case when model has 2 inputs.
My generator ouput was:
return (a, b)
I changed it to:
return ((a, b), None)
hello! since the model.fit and model.fit_generator are essentially functions for the repeated loop over several epochs, I abandoned using it and, instead, used customized for loop and enumerating the generated dataset (i mostly use pytorch, therefore, it is more convenient for me to for loop).
Figure 1
Above, I am using customized dataset generator (I modified this python code: https://github.com/wjddyd66/Tensorflow2.0/blob/master/SSD/voc_data.py), where I send generated dataset to the training code below:
this way I was able to fix the problem above. Hope it helps to you guys!
Tensorflow V2 (latest) Keras (latest) ssd300_training.ipynb
I have managed to convert most of the V1 code to V2 and successfully run it. I have made changes to all the python files as necessary too. However, this issue occurs on the line
history = model.fit_generator(generator=train_generator, steps_per_epoch=steps_per_epoch, epochs=final_epoch, callbacks=callbacks, validation_data=val_generator, validation_steps=ceil(val_dataset_size/batch_size), initial_epoch=initial_epoch)
Entire error:
Epoch 1/120
Epoch 00001: LearningRateScheduler reducing learning rate to 0.001.
ValueError Traceback (most recent call last)