_call_train_loop_hooks() cause error

jantic / DeOldify

A Deep Learning based project for colorizing and restoring old images (and video!)

MIT License

18.02k stars 2.57k forks source link

_call_train_loop_hooks() cause error #74

Closed miaoqiz closed 5 years ago

miaoqiz commented 5 years ago

Hi,

The "_call_train_loop_hooks" function causes error if running more than one epoch. It complains about "out-of-bound" index in the "callback.py".

I am not sure if this is important in terms of training, it seems to be related to visualizing the graph, can I comment it out?

Thanks!

jantic commented 5 years ago

I need more information (like a stack trace perhaps). This error is new to me. Multiple epochs are nothing new in training, so I don't think it's quite that per se that is the triggering issue.

jantic commented 5 years ago

Also, are you trying to do anything custom?

miaoqiz commented 5 years ago

Not really. Just run multiple epochs.

Here is the error. I feel like there is something about the data-iterator. What does "_call_train_loop_hooks()" do exactly?

Traceback (most recent call last): line 242, in _call_train_loop_hooks hook_result = hook(gresult, cresult) line 70, in train_loop_hook tbwriter=self.tbwriter) line 138, in output_image_gen_visuals self._output_visuals(ds=md.val_ds, model=model, iter_count=iter_count, tbwriter=tbwriter, validation=True) line 147, in _output_visuals image_sets = ModelImageSet.get_list_from_model(ds=ds, model=model, idxs=idxs) line 35, in get_list_from_model x,y=ds[idx] line 168, in getitem return self.get1item(idx) line 161, in get1item x,y = self.get_x(idx),self.get_y(idx) line 13, in get_x x = super().get_x(i) line 245, in get_x def get_x(self, i): return open_image(os.path.join(self.path, self.fnames[i])) IndexError: index 0 is out of bounds for axis 0 with size 0

Thanks!

jantic commented 5 years ago

This might help actually, though it's a bit hard to read. It looks like you may not have images in the expected directory for imagenet. Double check where it's looking for that. Starting from the folder where you're executing from, this would be data/imagenet/ILSVRC/Data/CLS-LOC/train. i.e. this would be relative to the Jupyter notebooks.

miaoqiz commented 5 years ago

Hi,

Thanks for the quick response! I actually changed the data directory.

The training went on nicely until epoch#33 where the error occurred. If I changed the number of epochs from "50" to "100", the training would stop at epoch#10. That is why I said there may be something dealing with the batch number and number of epochs. I could be wrong though.

Thanks!

jantic commented 5 years ago

Ok that's helpful information. I've never done that many epochs with the training. I'll keep this open for now but this may very well be a non-issue with FastAI v1 upgrade.

miaoqiz commented 5 years ago

Thanks!

Can I comment out "_call_train_loop_hooks()" for now? will it affect training like forward propagation, etc.?

jantic commented 5 years ago

You should be able to comment out _call_train_loop_hooks() if you're not concerned about Tensorboard functionality.

miaoqiz commented 5 years ago

Thanks!

miaoqiz commented 5 years ago

Hi,

To resume training based on your pre-trained model, I can just load up the pre-trained "gen_192" model in the "ColorizeTraining.py", is it correct?

Without the pre-trained "_critic_192" model, it may be hard to reach the same level of result, as the discriminator that starts from nothing needs to catch up with the generator.

Thanks!

jantic commented 5 years ago

Yeah you can't (easily) get the critic caught up to work in the way you want to here. That'll change with the next update to DeOldify but unfortunately basically the saved generator for now is -just- good for visualization.