microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Examples Video Conv3D_UCF11 #1628

Closed pmfcdb closed 7 years ago

pmfcdb commented 7 years ago

Hi,

Using ( CNTK-2-0-beta11-0-Windows-64bit-GPU, windows 10, anaconda)


I was trying to run the video example : Conv3D_UCF11 but I got an error:

(C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35) C:\Users\CVIG\Documents\CNTK-2-0-beta11-0-Windows-64bit-GPU\cntk\Examples\Video\GettingStarted\Python>python Conv3D_UCF11.py Traceback (most recent call last): File "Conv3D_UCF11.py", line 246, in conv3d_ucf11(train_reader, test_reader) File "Conv3D_UCF11.py", line 197, in conv3d_ucf11 mm_schedule = momentum_as_time_constant_schedule(momentum_time_constant, epoch_size=epoch_size) File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages\cntk\utils\swig_helper.py", line 62, in wrapper result = f(*args, **kwds) File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages\cntk\learner.py", line 311, in momentum_as_time_constant_schedule raise ValueError('when providing the schedule as a number,' ValueError: when providing the schedule as a number, epoch_size is ignored


I installed imageio from pip, because I had an error when trying to install from conda:

conda install -c pyzo imageio Using Anaconda Cloud api site https://api.anaconda.org Fetching package metadata ......... Solving package specifications: ....

The following specifications were found to be in conflict:

Thks

Paulo

ottolu commented 7 years ago

I think you could use pip to install that package, it works on my side.

pmfcdb commented 7 years ago

I already did that:

(C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35) C:\Users\pbrito\Documents\CNTK-2-0-beta11-0-Windows-64bit-GPU\cntk\Examples\Video\DataSets\UCF11>pip install imageio Requirement already satisfied: imageio in c:\local\anaconda3-4.1.1-windows-x86_64\envs\cntk-py35\lib\site-packages Requirement already satisfied: pillow in c:\local\anaconda3-4.1.1-windows-x86_64\envs\cntk-py35\lib\site-packages (from imageio) Requirement already satisfied: numpy in c:\local\anaconda3-4.1.1-windows-x86_64\envs\cntk-py35\lib\site-packages (from imageio)

Are you using Anaconda3 Python 3.5?

ottolu commented 7 years ago

Sorry, my bad, I have missed some messages. ;-)

According to the error message:

raise ValueError('when providing the schedule as a number,' ValueError: when providing the schedule as a number, epoch_size is ignored

Have you tried to remove the epoch_size=epoch_size in function momentum_as_time_constant_schedule?

The Python API might have slightly changed in 2.0-beta-iteration, some of the examples may haven't updated yet.

pmfcdb commented 7 years ago

Hello,

That worked, I changed this line in CNTK-2-0-beta11-0-Windows-64bit-GPU\cntk\Examples\Video\GettingStarted\Python\Conv3D_UCF11.py

mm_schedule = momentum_as_time_constant_schedule(momentum_time_constant, epoch_size=epoch_size)

TO mm_schedule = momentum_as_time_constant_schedule(momentum_time_constant)

And I got this result after some time: ... Finished Epoch[30 of 300]: [Training] loss = 1.638720 1291, metric = 56.7% 1291 2262.641s ( 0.6 samples per second);

Final Results: Minibatch[1-154]: errs = 75.82% * 306

Is this correct? Can I visualize the results in any way? Thks Paulo

ottolu commented 7 years ago

That's great. The loss and validate error seems make sense. I think you could plot the loss curve to make sure if the training is healthy or not.

CNTK supports using TensorBoard to visualize training task: https://github.com/Microsoft/CNTK/wiki/Using-TensorBoard-for-Visualization

pmfcdb commented 7 years ago

This is the complete output from console : Training 15986507 parameters in 20 parameter tensors.

Finished Epoch[1 of 300]: [Training] loss = 2.393231 1291, metric = 88.1% 1291 531.965s ( 2.4 samples per second); Finished Epoch[2 of 300]: [Training] loss = 2.375974 1291, metric = 85.5% 1291 434.337s ( 3.0 samples per second); Finished Epoch[3 of 300]: [Training] loss = 2.377306 1291, metric = 87.5% 1291 496.812s ( 2.6 samples per second); Finished Epoch[4 of 300]: [Training] loss = 2.283117 1291, metric = 82.2% 1291 594.714s ( 2.2 samples per second); Finished Epoch[5 of 300]: [Training] loss = 2.236443 1291, metric = 77.1% 1291 598.138s ( 2.2 samples per second); Finished Epoch[6 of 300]: [Training] loss = 2.194415 1291, metric = 76.8% 1291 609.350s ( 2.1 samples per second); Finished Epoch[7 of 300]: [Training] loss = 2.162938 1291, metric = 74.8% 1291 602.613s ( 2.1 samples per second); Finished Epoch[8 of 300]: [Training] loss = 2.136575 1291, metric = 75.6% 1291 618.451s ( 2.1 samples per second); Finished Epoch[9 of 300]: [Training] loss = 2.115717 1291, metric = 74.4% 1291 588.179s ( 2.2 samples per second); Finished Epoch[10 of 300]: [Training] loss = 2.083642 1291, metric = 75.1% 1291 624.458s ( 2.1 samples per second); Finished Epoch[11 of 300]: [Training] loss = 2.116440 1291, metric = 73.8% 1291 569.907s ( 2.3 samples per second); Finished Epoch[12 of 300]: [Training] loss = 2.096908 1291, metric = 74.1% 1291 418.765s ( 3.1 samples per second); Finished Epoch[13 of 300]: [Training] loss = 2.092687 1291, metric = 73.2% 1291 491.512s ( 2.6 samples per second); Finished Epoch[14 of 300]: [Training] loss = 2.046194 1291, metric = 72.1% 1291 563.343s ( 2.3 samples per second); Finished Epoch[15 of 300]: [Training] loss = 2.046541 1291, metric = 73.0% 1291 2261.651s ( 0.6 samples per second); Finished Epoch[16 of 300]: [Training] loss = 1.963759 1291, metric = 70.6% 1291 2261.378s ( 0.6 samples per second); Finished Epoch[17 of 300]: [Training] loss = 1.933344 1291, metric = 69.3% 1291 2262.893s ( 0.6 samples per second); Finished Epoch[18 of 300]: [Training] loss = 1.855419 1291, metric = 66.5% 1291 2263.440s ( 0.6 samples per second); Finished Epoch[19 of 300]: [Training] loss = 1.827450 1291, metric = 64.0% 1291 2263.456s ( 0.6 samples per second); Finished Epoch[20 of 300]: [Training] loss = 1.793972 1291, metric = 62.7% 1291 2262.862s ( 0.6 samples per second); Finished Epoch[21 of 300]: [Training] loss = 1.772261 1291, metric = 62.7% 1291 2262.878s ( 0.6 samples per second); Finished Epoch[22 of 300]: [Training] loss = 1.955709 1291, metric = 65.2% 1291 2262.347s ( 0.6 samples per second); Finished Epoch[23 of 300]: [Training] loss = 1.856119 1291, metric = 63.5% 1291 2263.159s ( 0.6 samples per second); Finished Epoch[24 of 300]: [Training] loss = 1.775823 1291, metric = 64.1% 1291 2263.159s ( 0.6 samples per second); Finished Epoch[25 of 300]: [Training] loss = 1.742140 1291, metric = 61.4% 1291 2262.724s ( 0.6 samples per second); Finished Epoch[26 of 300]: [Training] loss = 1.739192 1291, metric = 60.7% 1291 2263.267s ( 0.6 samples per second); Finished Epoch[27 of 300]: [Training] loss = 1.689121 1291, metric = 59.3% 1291 2262.362s ( 0.6 samples per second); Finished Epoch[28 of 300]: [Training] loss = 1.678783 1291, metric = 58.7% 1291 2263.220s ( 0.6 samples per second); Finished Epoch[29 of 300]: [Training] loss = 1.653415 1291, metric = 57.4% 1291 2262.628s ( 0.6 samples per second); Finished Epoch[30 of 300]: [Training] loss = 1.638720 1291, metric = 56.7% 1291 2262.641s ( 0.6 samples per second);

Final Results: Minibatch[1-154]: errs = 75.82% * 306

My question is how can I evaluate a new video for actions? Where is the model and how can I evaluate?

Thks Paulo

ottolu commented 7 years ago

Please check this out: https://github.com/Microsoft/CNTK/wiki/Evaluate-a-saved-convolutional-network

pmfcdb commented 7 years ago

Thks, I got some results, but because the errs= 75.82 is high, there is a lot of bad classifications. What I did was, in Conv3D_UCF11.py I add in line (239) z.save_model("dnn_video_actions.model"), then I trained the neural network.

After that I create a file Eval_conv3d.py to evaluate the model. Also I got 2 snapshots of a training video

\cntk\Examples\Video\DataSets\UCF11\action_youtube_naudio\trampoline_jumping\v_jumping_02\ v_jumping_02_01.avi

and resize to 112x112


from cntk.ops.functions import load_model from PIL import Image import numpy as np

z = load_model("conv3d.dnn")

rgb_image1 = np.asarray(Image.open("jumping0201_00.jpg"), dtype=np.float32) bgr_image1 = rgb_image1[..., [2, 1, 0]] x1 = np.rollaxis(bgr_image1, 2)

rgb_image3 = np.asarray(Image.open("jumping0201_08.jpg"), dtype=np.float32)
bgr_image3 = rgb_image3[..., [2, 1, 0]] x3 = np.rollaxis(bgr_image3, 2)

y = [x1,x1,x1,x1,x1,x1,x1,x1,x1,x1,x1,x1,x1,x1,x1,x3]

pic = np.ascontiguousarray(y) pic.shape = (3,16,112,112) predictions = np.squeeze(z.eval({z.arguments[0]:[pic]})) top_class = np.argmax(predictions)

print ("Prediciotns: ",predictions) print ("Top Class: ",top_class)


The result was: Prediciotns: [-11.28403473 -0.26260215 19.61235046 5.70545959 -69.05567932 -7.38749313 27.68449402 22.25476074 51.31705856 -25.28030777 -0.65711987] Top Class: 8

How can I improve training session? 75% of error is too high!

ebarsoumMS commented 7 years ago

The UCF11 is a very basic example to show you how to implement 3D CNN in CNTK. We will be adding more advance examples that should match state-of-the-art result.

Here some ideas for improvement:

ottolu commented 7 years ago

Hi @pmfcdb, I suggest you could have a glance at the original paper, if you want a video classification model for production.

For training an usable C3D network, it usually should pretrain it with a large video dataset, like sports1m first, then fine-tune it with your customized dataset, like UCF101, which might be much smaller. And some data augmentation methods emad mentioned above should help.

Thanks.

ajaysharmahcl commented 7 years ago

Hi pmfcdb,

How did you get this model while visualizing the video file output ? I see a model path in conv3d_ucf11.py file but actually there is no model specified to use this model path. I have got the result just like you but now struggling on how to test a video file ? Please help.

Anyone has done video recognition using this cntk , please share correct way of doing it as it's bit confusing ..