sezan92 / sezan92.github.io

1 stars 1 forks source link

RCNN blog #27

Open sezan92 opened 1 year ago

sezan92 commented 1 year ago

Objective

This issue is to work on RCNN blog.

Tasks

sezan92 commented 1 year ago

Topics

- [ ] What i have learnt

sezan92 commented 1 year ago

Why started working on RCNN

I have been working as AI engineer for 5 years. Most of my projects are mostly related to Computer vision, Almost all of them are on Object Detection. I have trained, fine-tuned, collected data, and deployed Object Detection models many times. But I could not get any chance to implement any one object detection model from scratch! Because implementing object detection models from scratch is not efficient when you already have open source implementations available!

Is it necessary to implement?

I do not know about others. But for me, implementing any model/algorithm from scratch helps me understand the issue far better than just reading from other sources! For production, we do not need to implement it from scratch. We can reuse available implementations/libraries. But for own self, I find implementing is the best way to know any architecture.

Why an old algorithm like RCNN?

For several reasons,

What kind of skills it may show?

If I had implemented a SOTA algorithm, it might have shown my skills in the latest algorithms. I get that. But due to my less time in hand with full-time work and other stuff (family, learning other stuff on AI, sports, etc), I needed to start with an old and easy one. All-and-all, this implementation has taught me several things

TODO

sezan92 commented 1 year ago

What was my expected outcome

My expected outcome was to implement an object detection model from scratch. Nothing else. If someone reuses it (which is highly implausible) then fine! But I got more than that

What outcomes I earned

What have I learnt.

TODO

sezan92 commented 1 year ago

Update 2023/03/06

09E323FC-A946-42F6-80EC-D5E4FF1C4230

TODO

sezan92 commented 1 year ago

Update 2023/03/07

sezan92 commented 1 year ago

TODO 2023/03/07

sezan92 commented 1 year ago

Update 2023/03/08

TODO 2023/03/08

sezan92 commented 1 year ago

Update 2023/03/09

RCNN description

The RCNN model is not an end-to-end model. i.e. we cannot feed the dataset as annotated, at one end and expect the model to figure out the rest. Rather it has a multi-step process for training. The processes are described below,

Dataset Preparation

For the dataset preparation, we extract regions based on selective search and then filter out the regions with IoU greater than a certain threshold (here $0.6$) as positive images. Here, positive means the image belongs to a certain class. [add flowchart of region extraction]

$Region extraction -> measure iou -> if iou greater than upper threshold -> positive for a class$

most likely one region might have overlap with multiple classes. For example, if there is a picture of both dog and cat, there is a chance that the regions of dog and cats will have common overlap. in that case, we consider the maximum region iou.

[ Add flowchart for data preparation]

psuedocode

for image, bboxes,labels from dataset
    for bbox, label in bboxes , labels
        regions <- selective_search(image)
        for region in regions
            max_region_iou <- 0
            for bbox in bboxes:
                region_iou <- get_iou(region, bbox)
                if region_iou > max_region_iou
                    max_region_iou <- region_iou
                    max_region_label <-label corresponding to bbox
             if max_region_iou > upper_iou_threshold
                 save_the_region_in_the_respective_dir(region, max_region_label)
             elif max_region_iou < lower_iou_threshold
                 save_the_region_as_background(region)

The code is inefficient. I hope to optimize it later.

command

python3 /src/prepare_data.py {voc2007,voc2012}  --ss_method SS_METHOD --num_rects NUM_RECTS --output OUTPUTDIRECTORY --data_batch_size DATABATCHSIZE --split {train,test,validation} --upper_iou_thresh UPPER_IOU --lower_iou_thresh LOWER_IOU --minimum_bg MINIMUM_SIZE_OF_BACKGROUND_IMAGE

TODO

sezan92 commented 1 year ago

Update 2023/03/10

TODO

if revision done

sezan92 commented 1 year ago

Update 2023/03/13

Model

In the original RCNN paper. they used Alexnet as the CNN model. The reason is that it was a State-of-the-art model at the time. I used the VGG16 model. The only reason is that it was very easy to use in TensorFlow. Also in the original paper, they extracted features from the model and fed them into the SVM layer. It was chosen empirically. I only used softmax because again, it seemed easier. So in short we can summarize the difference like the following,

image -> Alexnet -> features -> SVM -> result

my implementation

image -> VGG16-> features -> softmax -> result

Dataset

The original paper trained the model on VOC2007, and VOC2012 [confirm it]. I started to train on VOC2012, but the evaluation metrics didn't seem good at the beginning. It was very poor for several reasons (will explain later the challenges faced). But after some time I realized from the confusion matrix (add confusion matrix) that the model was working well on only pictures of dogs and cats. So I decided to only work on them from all of the classes. for simplicity, later maybe I will increase the complexity.

steps

TODO

sezan92 commented 1 year ago

Update 2023/03/15

TODO

sezan92 commented 1 year ago

Update 2023/03/27

Training model

sezan92 commented 1 year ago

Update 2023/04/04

Training model

In the paper, they selected the Alexnet model, these days, there are far better models. I selected to use VGG16 as it is fairly easy to use. In addition to the model vgg16 model, I added some augmentation layers . They are random flip, random translation, random rotation, and random contrast. At the end of the model, I used a 4096-dimensional linear layer withreluactivation function and usedsoftmax` for classification.

Image -> Augmentation layers -> VGG16 model without classification layer -> flattening + dropout -> relu layer -> classification layer 

[Add block diagram]

TODO

sezan92 commented 1 year ago

Update 2023/04/08

Block diagram

TODO

sezan92 commented 1 year ago

Update 2023/04/12

Block diagram

VGG16RCNN drawio

sezan92 commented 1 year ago

Update 2023/04/12

Initial Result, challenges faced

After training on VOC2012, I got the results like https://github.com/sezan92/ComputerVision/issues/85#issuecomment-1328206603

If you check the confusion matrix properly, most objects were not classified correctly! There were biases for certain classes! This seemed problematic. So I tried to debug the issue. To make things easier, I chose only two classes, Cats and Dogs, with the background. From visual inspection (I cannot provide the stats should get it), many BG class images seemed to have weird sizes and shapes. For example, 10 x 100, 1 x 10 etc. But in the test case, that might not be the case.

Also, another problem seemed that due to one iou threshold, many background images having very similar iou (like 0.45) were selected as Background. to make sure background were really background

TODO

sezan92 commented 1 year ago

Update 2023/04/17

TODO

sezan92 commented 1 year ago

Update 2023/04/19

TODO

sezan92 commented 1 year ago

Update 2023/04/25

sezan92 commented 1 year ago

Update 2023/04/27