scaelles / OSVOS-TensorFlow

One-Shot Video Object Segmentation
http://vision.ee.ethz.ch/~cvlsegmentation/osvos/
GNU General Public License v3.0
438 stars 132 forks source link

more frames for training esource exhausted: OOM when allocating tensor with shape[3,3,512,512] Started loading files... #13

Closed Jingyao12 closed 6 years ago

Jingyao12 commented 6 years ago

Hi Caelles, I tried you code and everything goes will with one shot osvos_demo.py. However, when I want to use more annotated frames by revising train_imgs = [os.path.join('DAVIS', 'JPEGImages', '480p', seq_name, '00000.jpg')+' '+ os.path.join('DAVIS', 'Annotations', '480p', seq_name, '00000.png')] using train_imgs = [os.path.join('DAVIS', 'JPEGImages', '480p', seq_name, '00000.jpg')+' '+ os.path.join('DAVIS', 'Annotations', '480p', seq_name, '00000.png'), os.path.join('DAVIS', 'JPEGImages', '480p', seq_name, '00005.jpg')+' '+ os.path.join('DAVIS', 'Annotations', '480p', seq_name, '00005.png') ]

the log shows the OOM when allocating tensor with shape[3,3,512,512]
I using 400G regular RAM.

Do you have any suggestion about the multi annotated frames? like you paper claimed 2 annotated frames and 4 annotated frames

Thank you!

scaelles commented 6 years ago

Hello, The one you propose is the correct one to train for multiple annotated frames. I tried to reproduce your error, but I couldn't. If I use those two images for training, I don't get any error. I use a computer with 256G of RAM and a Titan X Maxwell and tensorflow-gpu 1.4. Which hardware are you using? Which tensorflow version? Best

Jingyao12 commented 6 years ago

Hi, Thank you for the reply! I run it on HPC. it looks normal now. I guess there was some problems on HPC yesterday. Besides, I still have another question. Is the training loss supposed to decrease? because I found the training loss is oscillation instead of decreasing. Do you know why it behaves like this? Thank you! Best

scaelles commented 6 years ago

Hello, If you download the pretrained models for each sequence you can find the training evolution for the fine tuning of each sequence, just execute tensorboard in the model folder. For example the training of blackswan looks like: image So it decreases a lot at the beginning and then it continues decreasing but slowly. Best

Jingyao12 commented 6 years ago

Hi, Thank you for the clear explanation. Best