Can you please clear my doubts, Naisy?

PiyalGeorge commented 5 years ago

Hi, Its an awesome work you have done, really impressed. I'm also trying to do these things get better results. I wish you will help me with my simple doubts.

I'm trying to do vehicle detection with Jetson-tx2, and tensorRT. I trained with caffe's mobilenet-ssd model on custom dataset and run it with an iplugin prototxt file(to run in board). Things are running. but my problems are - (1)the accuracy is not good. i trained with berkley's vehicle dataset. about 56000 images, and 156000 iterations. things are getting detected, but they are not constantly detected and are flickering. (2)Also another issue is far away things(distant objects) are not detected.

Hope you can tell me a good solution. I heard Faster-RCNN can detect distant things much better, but i dont know whether it will run in Jetson-tx2 with real-time FPS. If mobilenet-ssd is not good at this or if any kind of change i need to make(except the Jetson-tx2 board), please tell me.

naisy commented 5 years ago

Hi @PiyalGeorge,

This is a guess as I have never tried berkley's vehicle dataset.

1-1. Need preprocessing. 1-2. Need more learning steps. 2-1. Need parameter optimization. 2-2. Change to Faster RCNN.

About 1-1. I think that the input data to prediction uses a camera or video. Learning data should be as similar as possible to the data at the time of prediction. For example, normalize the berkley's vehicle dataset and let it learn. Normalize camera and video frames before prediction. I think this is the same as CNN training.

About 1-2. Problem of precision may be solved by increasing the number of learning steps. Although it is an easy method, it is very time-consuming or unclear whether it will change dramatically.

About 2-1. SSD is not suitable for small object detection. This is because we abandon the processing of small areas that exist in large quantities and speed up. Even so, if you want to improve it, you may solve it by optimizing parameters, such as changing to 500x500. The following is face detection, but you can see the wonderfulness of his model by looking at football videos. https://github.com/yeephycho/tensorflow-face-detection Since it was interesting, I tried to write based on realtime_object_detection. https://github.com/naisy/realtime_face_detection Unfortunately, I did not know parameters other than 500x500.

About 2-2. SSD is not good at detecting small objects and requires a large number of learning steps. Faster RCNN converges faster and it is also possible to detect small objects. However, TX2 may run out of memory.

PiyalGeorge commented 5 years ago

Thanks @naisy , Thanks alot... i will look into the above repos

PiyalGeorge commented 5 years ago

Hey @naisy , I looked into those repos you have mentioned above. So as i said before, i trained a caffemodel for vehicle detection in mobilenet-ssd following this https://github.com/chuanqi305/MobileNet-SSD. I'm planning to train again. Do you have any suggestions from your experience for increasing the accuracy of the model, by making changes in train.prototxt, solver.prototxt or a new vehicle datasets? Anything,... any idea to increase to accuracy and reduce flickering?

naisy commented 5 years ago

Hi @PiyalGeorge,

I do not know if it works, but what if try changing input_size from 300 to 500? https://github.com/chuanqi305/MobileNet-SSD/blob/master/gen.py#L703

PiyalGeorge commented 5 years ago

@naisy Thanks, i'll try that. Also i'll crop the images and give square images, hoping that resizing the image wont destroy the image. Thanks.

PiyalGeorge commented 5 years ago

Hey @naisy , I'm going to do training for vehicle detection. like you said above i will try for 500 this time. i'm going to train 6 different classes. So basically how many images of each class i need to include? also how many iterations should i run?(i'm not asking for exact values here, i'm asking for approximate values). I want detection all classes to be good, that's why i'm asking. But also i heard that fps will be less if we change from 300 to 500. Please help

naisy commented 5 years ago

Hi @PiyalGeorge,

6 classes is good.
Each class should have 100 or more. In other words, I think that it will recognize if there are 100. As a guide, if you have 1000 objects in a class, I think you can better detect them.
The same number of data is desirable for all classes. Ex: class1 has 10000, class2 has 500, is bad. class1 has 1000, class2 has 500 is a little bad. class1 has 500, class2 has 500, is good. Although it would be useless to reduce the number of data, I think that the balance will be better.
SSD500 is slow more than SSD300. However, since output class is few, I think FPS will not so much bad.

PiyalGeorge commented 5 years ago

@naisy , Great Thanks. Thanks a lot. two more things. 1) Currently i'm trying to do vehicle detection using jetson-tx2(board) and tensor-rt(that's the aim). we are not promised to do vehicle detection with one particular cnn model such as mobilenet-ssd(neural-network). we can perform vehicle detection on any cnn model, but only conditions are - it should be accurate, should detect distant as-well-as closer objects without flickering and should run with realtime-fps(also board should be jetson-tx2). We've tried so many methods still the best results we are getting is from mobilenet-sdd. I understood that not all cnn models wont run the same-fps and accuracy on board. So next time if we are doing something like this, what is the first thing we have to put into consideration- our aim or neural-network or electronic-board? (eg:- im asking like, should we buy the board based on aim and neural-network? or should we decide neural-network after the board is bought?)also please specifically say the reason, cuz i've been dying to figure this out.

2) I'm actually new to this machine learning field. previously i've been working with python and django stuffs. With that knowledge in python only i'm working on these deep-learning stuffs. So how can i become an expert like you in deep learning? What more new languages/new technologies i need to learn to achieve this? Do i need to do a degree for this or from the online tutorials?, if so which is the best tutorial i can follow? I'm asking you this because, i dont have anyone to guide me, i know you are really helpful, but every time when i'm here to ask you this, it'll be more of a burden for you.

Hope you can understand the situation of beginner to this field. hope you will help.

naisy commented 5 years ago

Hi @PiyalGeorge,

Only SSD can detect 30 fps in object detection with TX2. YOLO should be also, but I have never seen a good execution example.

The mobilenet is a model that researchers have given the best tuning for mobile devices. It is known that if you tune more such as quantization, it will not work.

At the time of TX2 and 30 FPS, SSD mobilenet is the only choice. (SSD mobilenet is also about 10 FPS if you do not tune the execution code) The latest TensorRT/C ++ also seems to achieve this speed. I have not used it yet, so I do not have accurate information.

To detect small objects, you need to select Faster R-CNN. But Faster R-CNN is very slow. Even with Desktop PC it is 10-20 FPS. However, if you want the best object detection accuracy, you will select Faster R-CNN. This model is a model too large for TX2.

In other words, if you limit to TX2, you will choose SSD mobilenet. However, if you increase the accuracy, you can not achieve 30 FPS. There are also hands to stop SSD and use "selective search". Perhaps this is better for you.

Because desktop PC Faster R-CNN is 10-20 FPS, it seems to be a problem not to solve even if changing the board. Nevertheless Jetson Xavier still has more memory than TX2, so I think that I can put it as a candidate. TX2 is about 1/5 of GTX 1050, but Xavier is about 1/3 speed.

It has not been two years since I started writing Python. Knowledge learned at universities such as various programming languages, software engineering, UNIX, algorithms is useful, but we think that the most important thing in DeepLearning is training-data rather than degree. Last year I wrote a model with full scratch and make training-data. I understand that the accuracy is different decisively because of the good or bad of the data rather than the model. The accuracy difference by model is only about 10%. However, the difference in accuracy due to the difference in data is obvious, and if you want to improve accuracy by more than 10%, you only have to improve the data.

I think that the model is not need to create from full scratch, but if you create the training-data yourself it will be a very good study.

PiyalGeorge commented 5 years ago

@naisy , Thanks . Thank you so much for being such a helpful friend. :grinning: :grinning:

naisy commented 5 years ago

Hi @PiyalGeorge,

You are welcome.

naisy / realtime_object_detection

Can you please clear my doubts, Naisy? #58