notebook for Object detection in chapter 8 has some error

zackchase / mxnet-the-straight-dope

An interactive book on deep learning. Much easy, so MXNet. Wow. [Straight Dope is growing up] ---> Much of this content has been incorporated into the new Dive into Deep Learning Book available at https://d2l.ai/.

https://d2l.ai/

Apache License 2.0

2.56k stars 726 forks source link

notebook for Object detection in chapter 8 has some error #319

Open ashishkumar-rambhatla opened 6 years ago

ashishkumar-rambhatla commented 6 years ago

Hey!

I'm trying to implement SSD algorithm detailed in the computer vision chapter of your blog. But unfortunately, I couldn't get the output even with the pre-trained weights. Till the "start training" step everything is working fine. But after I execute that step it is taking ages even with a NVidia GeForce 940M GPU. I have tried loading pre-trained weights but still, it isn't coming out with results as expected. I suspect some flaw in the code. Can you please find and rectify?

stephenmbull commented 6 years ago

Hi,

What results are you getting? Perhaps you could provide the output here. Just out of curiosity, have you tried running the same code on CPU only? I can only run on CPU on my laptop, and other than taking a while to run (about 10 minutes for me), I see results similar to the expected output in the tutorial.

Edit: Just to clarify, you are referring to Chapter 8 SSD: Single Shot MultiBox Detector tutorial, correct?

-Stephen

ashishkumar-rambhatla commented 6 years ago

Hi,

I haven't tried running it on a CPU, will do that and tell you what is the result. I'm actually not getting any errors, it's just that the code is staying in the loop forever( like 4hrs), so I've terminated it manually. Just out of curiosity, is that 10 minutes with the pre-trained weights or from the scratch. if it's from the scratch would it really take just half an hour to train?

ashishkumar-rambhatla commented 6 years ago

I have tried to execute the same with gpu and it worked, thanks for that. But when I'm trying to execute the same code with gpu, my kernal is dying automatically after few minutes. Small commands like me.nd.ones((2,3),mx.gpu()) are giving expected results. Why is gpu failing to execute the training chunk while CPU has done it in a matter of 10 minutes? Can you please spot the possible error that I might have overlooked?

ashishkumar-rambhatla commented 6 years ago

I'm sorry in the first line it was CPU not gpu( it worked on a CPU but failed on a GPU)... ( By failing i mean it is automatically killing the kernal which leads to restart of the kernal and I had to execute the whole notebook from start)

stephenmbull commented 6 years ago

10 minutes was pre-trained. It seems to run okay for me with full training too, but I've never run it to completion, it would take longer than I'm willing to wait for! I do see the epoch output at each iter though and I've let it run for just short of an hour without issue before killing it. Wish I had a GPU to test on but oh well. Hopefully someone else can speak to that.

Edit: It just occurred to me that I'm running code standalone at a Linux term. I've not been running any tutorial code through Jupyter. You mentioned notebook in your last post, so presumably you are running through the Jupyter-based examples.