Closed miaoqiz closed 5 years ago
I need more information (like a stack trace perhaps). This error is new to me. Multiple epochs are nothing new in training, so I don't think it's quite that per se that is the triggering issue.
Also, are you trying to do anything custom?
Not really. Just run multiple epochs.
Here is the error. I feel like there is something about the data-iterator. What does "_call_train_loop_hooks()" do exactly?
Thanks!
This might help actually, though it's a bit hard to read. It looks like you may not have images in the expected directory for imagenet. Double check where it's looking for that. Starting from the folder where you're executing from, this would be data/imagenet/ILSVRC/Data/CLS-LOC/train. i.e. this would be relative to the Jupyter notebooks.
Hi,
Thanks for the quick response! I actually changed the data directory.
The training went on nicely until epoch#33 where the error occurred. If I changed the number of epochs from "50" to "100", the training would stop at epoch#10. That is why I said there may be something dealing with the batch number and number of epochs. I could be wrong though.
Thanks!
Ok that's helpful information. I've never done that many epochs with the training. I'll keep this open for now but this may very well be a non-issue with FastAI v1 upgrade.
Thanks!
Can I comment out "_call_train_loop_hooks()" for now? will it affect training like forward propagation, etc.?
You should be able to comment out _call_train_loop_hooks() if you're not concerned about Tensorboard functionality.
Thanks!
Hi,
To resume training based on your pre-trained model, I can just load up the pre-trained "gen_192" model in the "ColorizeTraining.py", is it correct?
Without the pre-trained "_critic_192" model, it may be hard to reach the same level of result, as the discriminator that starts from nothing needs to catch up with the generator.
Thanks!
Yeah you can't (easily) get the critic caught up to work in the way you want to here. That'll change with the next update to DeOldify but unfortunately basically the saved generator for now is -just- good for visualization.
Hi,
The "_call_train_loop_hooks" function causes error if running more than one epoch. It complains about "out-of-bound" index in the "callback.py".
I am not sure if this is important in terms of training, it seems to be related to visualizing the graph, can I comment it out?
Thanks!