qqwweee / keras-yolo3

A Keras implementation of YOLOv3 (Tensorflow backend)
MIT License
7.14k stars 3.44k forks source link

Train on Google Cloud ML #200

Open deencat opened 6 years ago

deencat commented 6 years ago

I am always getting error running train.py and ended up with resourceExhausted issue and I have been trying this for days without luck. I am running this on my laptop with Windows 10 and 2Gb 1050Ti.

I am wondering whether there is a way to train on google cloud ML instead, can you please give me some direction on where to change on the code to cater for this? Thanks a lot.

jyqian-aibee commented 6 years ago

I guess you can just open a Google Compute Engine instance, install dependencies, clone this repo and start training. After all 2GB graphics memory is too small.

deencat commented 6 years ago

Thanks for your suggestion, I got the same error running on my desktop with Ubuntu 16.04, and 6Gb 1060. It looks like during training of freeze the first 249 layers, it took up most of the GPU RAM and when comes to training of unfreeze layer, it cannot allocate anymore RAM.

It doesn't matter what the batch_size I set in the train.py, this issue will happen, just a matter a time.

Can anyone please help me on this, I have been working on this error for days.

jyqian-aibee commented 6 years ago

Have you tried shrinking your input size? That could possibly help when you unfreeze all the layers.