Closed anklebreaker closed 4 years ago
I haven't tested the tiny training yet. haha.. Anyway, Can you share your test code:?
class SaveCallback(tensorflow.keras.callbacks.Callback):
def __init__(self):
super().__init__()
self.best_loss = np.inf
self.filedir = "models/"
self.model = yolo.model
self.trial = trial
def on_epoch_end(self, epoch, logs=None):
keys = list(logs.keys())
if 'val_loss' in keys:
if logs['val_loss'] < self.best_loss:
savepath = self.filedir + 'trial' + str(self.trial) + '-epoch' + str(epoch + 1)
print('\n' + 'Val_loss improved from ' + str(self.best_loss) + ' to ' + str(logs['val_loss']) + '. Saving model to ' + savepath + '/' + 'ckpt...')
self.best_loss = logs['val_loss']
if not os.path.exists(savepath):
os.makedirs(savepath)
yolo.model.save_weights(savepath + '/' + str(self.trial) + '-epoch' + str(epoch + 1) + 'ckpt')
else:
print('\n' + 'Val_loss did not improve from ' + str(self.best_loss))
yolo = YOLOv4(tiny=True)
yolo.input_size = 352
yolo.batch_size = 128
yolo.subdivision = 2
yolo.channel_input = 4
yolo.anchors = np.round(np.array([16.30463356, 39.65267033, 23.3268787, 57.3856978, 33.07815072, 79.4291659, 45.28507059, 109.08966657, 65.79054054, 146.20500288, 99.24201597, 208.32784431])).astype(np.int32)
yolo.classes = {0: "relevant_person"}
eval = False
trial = 2
epochs = 1500
yolo.make_model()
yolo.model.load_weights('models/trial2-epoch200/2-epoch200ckpt')
train_data = yolo.load_dataset('traintext.txt')
val_data = yolo.load_dataset('valtext.txt', training=False)
lr = 1.
optimizer = optimizers.Adadelta(learning_rate=lr)
yolo.compile(optimizer=optimizer, loss_iou_type="ciou")
if not eval:
csvfile = "models/log-" + str(trial) + ".csv"
csvlog = tensorflow.keras.callbacks.CSVLogger(csvfile, separator=',', append=True)
yolo.model.fit(
train_data,
epochs=epochs,
verbose=1,
callbacks=[SaveCallback(), csvlog],
batch_size=yolo.batch_size // yolo.subdivision,
steps_per_epoch=yolo.subdivision,
validation_data=val_data,
validation_steps=1000//(yolo.batch_size//yolo.subdivision),
validation_freq=50,
initial_epoch=200
)
I made a custom callback to save the model just to test whether it was something wrong with ModelCheckpoint callback.
Please share the inference code too. I'm testing the tpu, so after the test, I'll test your code
pred_im = np.dstack((cv2.imread("ImageSections/2967_340_1583547791_FishEye_24732_0.jpg"), cv2.imread("CourtSections/2967_340_1583547791_FishEye_24732_0.jpg", 0)))
bboxes = yolo.predict(pred_im)
print(bboxes)
yolo.draw_bboxes(pred_im, bboxes)
plt.imshow(pred_im)
plt.show()
I modified the code for a 4 channel input. Inference is run when eval above is True
Unrelated, but I saw in Dataset class under the next() method that the counter was updated and resetted outside the for loop. Not sure if it's intentional or a bug, but I checked that the same image was being sent batch_size number of times per batch. I changed as below it to have the counter and reset inside the loop so it makes a batch with different images.
if self.batch_size > 1:
batch_x = []
#batch_y_s = []
batch_y_l = []
batch_y_m = []
for _ in range(self.batch_size):
x, y = self.preprocess_dataset(self.dataset[self.count])
batch_x.append(x)
#batch_y_s.append(y[0])
batch_y_m.append(y[0])
batch_y_l.append(y[1])
self.count += 1
if self.count == len(self.dataset):
np.random.shuffle(self.dataset)
self.count = 0
batch_x = np.concatenate(batch_x, axis=0)
#batch_y_s = np.concatenate(batch_y_s, axis=0)
batch_y_m = np.concatenate(batch_y_m, axis=0)
batch_y_l = np.concatenate(batch_y_l, axis=0)
batch_y = (batch_y_m, batch_y_l)
else:
batch_x, batch_y = self.preprocess_dataset(self.dataset[self.count])
self.count += 1
if self.count == len(self.dataset):
np.random.shuffle(self.dataset)
self.count = 0
Can you send me a PR with the following changes?
commit: 7fc91f630f1f0
@anklebreaker
yolo.predict(frame)
predicts only one image.
Oh, 4channel
It seems to be a problem of training, not a problem of saving and loading.
thanks for the update. What makes you say it is with the training? Training appeared fine as I only had issues when resuming training or testing a prediction in a different session
I just trained with custom data, and the result came out well... Maybe it was a setup error.
I couldn't find a clear answer. So I implemented yolo.save_weights()
function.
After training, save weights using yolo.save_weights("custom.weights", weights_type="yolo")
.
Then, when you want to load, load it using yolo.load_weights("custom.weights", weights_type="yolo")
.
It will be released in the v0.19.0 version. Ref: 35f1d22618, 79daece4538e1
Yeah, it might be some system or version issue. I'll try out the new functions. Thanks for looking into it!
Hey, thanks so much for creating this project! Was easy to use, I was able to customize it to my needs.
I've been trying to train tiny yolov4 on a custom single class. Currently, I'm getting pretty good results in terms of validation loss, and the model converges well. However, if the training session terminates and I either resume training or predict from a checkpoint (either manually from save_weights function or ModelCheckpoint callback), I get wildly different results. The loss jumps from 2 to over 200 as if the weights are untrained. Model predictions return zeros for the most part.
At first, I suspected that my preprocessing was causing this, but after double checking and searching online, it seems that there might be an issue with Keras and/or Tensorflow saving the network. Model.save() unfortunately doesn't work at all.
This post has long thread of others who seem to have a similar issue, and it appears the solution is architecture specific. From what I gathered, some possible causes are from Upsampling layers or Lambda layers in loops. Here's another post describing the solution as fixing a random seed.
For reference, I'm using yolo.model.load_weights() and yolo.model.save_weights().