hprop / mcv-m5

Master in Computer Vision - M5 Visual recognition
0 stars 0 forks source link

Feedback week 4 #2

Open lluisgomez opened 7 years ago

lluisgomez commented 7 years ago

I've been taking a look to your article in Overleaf, particularly to the sections related to these weeks' assignment. I think it is well written, although I'm not convinced with the overall structure. You seem to present three different "chapters" on classification, detection, and segmentation. Each one with their state of the art, methodology, and results sections. In my personal opinion the paper would be better structured with one unique of those sections for the whole project. So you start with the state of the art for classification, detection, and segmentation. Then you explain your methodology for the three problems, and finally show the results.

Other than that I have some minor comments:

I think you'll do good at simplifying the presentation of results in a single table for each dataset, so you show the final F-score on the test set for each of the methods and easily compare them. It is not necessary to show train and validation results, we asked you to calculate those results in order to understand some particularities of the models/datasets but for the final report just showing test results would be ok.

Same can be said of slides 17 and 18 in your presentation. We want the audience to have an idea of what is happening with a first glance. Either show only F-score values, or provide precision/recall/f-score only for the test set. Use different tables for different datasets and mark with bold text the best results.

Adding a figure with qualitative results will help the reader/listener understanding how your nets perform on the object detection task (where they perform well, where they fail, etc).

Typo on section 3.3 of the article: "valid" -> "validation".

I've also take a look to your integration of the SSD code and I've seen you have completed the task. I particularly appreciate your initiative to implement functionality to create the priors that are applied to each output layer, instead of just using the pre-computed priors in an external "pkl" file. However your modifications on the number of priors does not allow you to re-use the pre-trained weights of the original repository, right? This probably explain why your f-score results are worst that with the solution we will provide: 0.90 (with pre-trained weights) vs 0.71 (training from scratch). Please consider adding results with pre-trained weights for the final presentation and explaining the differences.

Regarding the use of different pre-processing techniques in order to boost the performance of YOLO, I think this is a wrong strategy. Since the best results with YOLO are obtained when the model is initialized with pre-trained weights, using a different pre-processing does not make sense unless you do the pre-training with the same pre-processing, right?

All in all, since you have completed all the tasks for these two weeks, I guess you will probably get the maximum mark.

hprop commented 7 years ago

First of all, we appreciate the tips about report structure and agree it results slightly redundant. We will restructure it asap. Same for train and validation tables which we will remove from the final report and the slides since figures already show most clues detailed in the text. Moreover we will add some figures showing test results in order to make the documents more comprehensive.

Regarding our SSD implementation, your are right that after modifying the priors we could not reuse the weights from the rykov8's repo. Moreover, we changed a little bit the topology of the base VGG16 model: for example, unlike rykov8's implementation we did not use a final GlobalAveragePooling (neither is it used in the SSD original paper). So there are several points that prevent us to use those weights.

But in line with your suggestion, for the final presentation we are planning to use some pre-trained base models. Instead of loading pre-trained weights for all the SSD layers, we could load them only for the base model, in the way it is done in the SSD paper. That would allow us to keep our priors (which we think are more suitable for the TT100K and Udacity datasets) and reuse our work in the first week (object recognition): we could use our VGG16, resnet or densenet models, pre-trained on TT100K. We think this would be an interesting experiment.

On the other hand, we are not sure how to understand what you suggest about showing qualitative figures. Are you meaning we should show image examples with True and False positives?

Regarding the YOLO preprocessing issue, we did not realize that and it makes all the sense. Then we are considering to do an extra experiment with the preprocessing used in the pre-trained model.

Finally, thanks for the positive feedback, we will let you know in case we have any matter in related items.