sezan92 commented 1 year ago

Objective

This issue is to work on RCNN blog.

Tasks

TBD

sezan92 commented 1 year ago

Topics

[x] Why I started workin on RCNN and why now ?
[x] What was my expected outcome?

~~- [ ] What i have learnt~~

[x] Start working on code from flow chart
- [x] code on prepare data script
- [x] code on training model
[x] Describe the flowcharts
[x] Describe the data
[x] What problems/ challanges i faced
[x] How did I solve the challenges
[x] why choose 2 classes later
[ ] Result

sezan92 commented 1 year ago

Why started working on RCNN

I have been working as AI engineer for 5 years. Most of my projects are mostly related to Computer vision, Almost all of them are on Object Detection. I have trained, fine-tuned, collected data, and deployed Object Detection models many times. But I could not get any chance to implement any one object detection model from scratch! Because implementing object detection models from scratch is not efficient when you already have open source implementations available!

Is it necessary to implement?

I do not know about others. But for me, implementing any model/algorithm from scratch helps me understand the issue far better than just reading from other sources! For production, we do not need to implement it from scratch. We can reuse available implementations/libraries. But for own self, I find implementing is the best way to know any architecture.

Why an old algorithm like RCNN?

For several reasons,

It is the least complex of all models
Because CNN-based object detection kickstarted with it.
I plan to refactor and rescale the algorithm into modern / latest/ SOTA models.

What kind of skills it may show?

If I had implemented a SOTA algorithm, it might have shown my skills in the latest algorithms. I get that. But due to my less time in hand with full-time work and other stuff (family, learning other stuff on AI, sports, etc), I needed to start with an old and easy one. All-and-all, this implementation has taught me several things

Chunking for big projects
Using GitHub projects feature and agile methodology to do big projects (more on that later)
Grinding my teeth even if some problem appears.

TODO

[x] Revise and check if this is okay or extra needed

sezan92 commented 1 year ago

What was my expected outcome

My expected outcome was to implement an object detection model from scratch. Nothing else. If someone reuses it (which is highly implausible) then fine! But I got more than that

What outcomes I earned

[x] structuring a big project into small sub tasks
[x] update the plan along the way
[x] when implementing a model first try on toy data, simple data.
[x] Implementing a model from paper
[x] Structuring project
[x] specifically Kanban board from github. it helped me a lot. for example
[x] Replanning the project when the results do not come good
[x] Debugging a project based on results

What have I learnt.

Same thing

TODO

[x] Check and revise this section
[x] Add details about structuring of this project.

sezan92 commented 1 year ago

Update 2023/03/06

[x] Initial flow chart

09E323FC-A946-42F6-80EC-D5E4FF1C4230

TODO

[x] check and revise
[x] if correct then make in pc

sezan92 commented 1 year ago

Update 2023/03/07

[x] Actual paper system
[x] We had to make some changes for initial implementation
[x] we didnt use svm . we directly used softmax
[x] we only trained on dogs and cats for easiness and lot of dataset.
[x] no evaluation yet.

sezan92 commented 1 year ago

TODO 2023/03/07

[x] make the flowchart in PC

sezan92 commented 1 year ago

Update 2023/03/08

[x] initial version of flowchart https://drive.google.com/file/d/1-ZNG2CTiiWa9GyJMNew1A5S5WBrA1ohH/view?usp=sharing

TODO 2023/03/08

[x] revise and check if something can be updated

sezan92 commented 1 year ago

Update 2023/03/09

RCNN description

The RCNN model is not an end-to-end model. i.e. we cannot feed the dataset as annotated, at one end and expect the model to figure out the rest. Rather it has a multi-step process for training. The processes are described below,

Dataset Preparation

For the dataset preparation, we extract regions based on selective search and then filter out the regions with IoU greater than a certain threshold (here $0.6$) as positive images. Here, positive means the image belongs to a certain class. [add flowchart of region extraction]

$Region extraction -> measure iou -> if iou greater than upper threshold -> positive for a class$

most likely one region might have overlap with multiple classes. For example, if there is a picture of both dog and cat, there is a chance that the regions of dog and cats will have common overlap. in that case, we consider the maximum region iou.

[ Add flowchart for data preparation]

psuedocode

for image, bboxes,labels from dataset
    for bbox, label in bboxes , labels
        regions <- selective_search(image)
        for region in regions
            max_region_iou <- 0
            for bbox in bboxes:
                region_iou <- get_iou(region, bbox)
                if region_iou > max_region_iou
                    max_region_iou <- region_iou
                    max_region_label <-label corresponding to bbox
             if max_region_iou > upper_iou_threshold
                 save_the_region_in_the_respective_dir(region, max_region_label)
             elif max_region_iou < lower_iou_threshold
                 save_the_region_as_background(region)

The code is inefficient. I hope to optimize it later.

command

python3 /src/prepare_data.py {voc2007,voc2012}  --ss_method SS_METHOD --num_rects NUM_RECTS --output OUTPUTDIRECTORY --data_batch_size DATABATCHSIZE --split {train,test,validation} --upper_iou_thresh UPPER_IOU --lower_iou_thresh LOWER_IOU --minimum_bg MINIMUM_SIZE_OF_BACKGROUND_IMAGE

TODO

[x] describe the selective search , iou threshold process
[x] describe the different scenarios , if an image belongs to 2 classes based on iou, what would happen
[x] share the codeblock
[x] share the command

sezan92 commented 1 year ago

Update 2023/03/10

[x] updated comment https://github.com/sezan92/sezan92.github.io/issues/27#issuecomment-1461192742

TODO

[x] check and revise

if revision done

[x] write up about model. chosen model in the paper and your model
[x] last layer.

sezan92 commented 1 year ago

Update 2023/03/13

Model

In the original RCNN paper. they used Alexnet as the CNN model. The reason is that it was a State-of-the-art model at the time. I used the VGG16 model. The only reason is that it was very easy to use in TensorFlow. Also in the original paper, they extracted features from the model and fed them into the SVM layer. It was chosen empirically. I only used softmax because again, it seemed easier. So in short we can summarize the difference like the following,

image -> Alexnet -> features -> SVM -> result

my implementation

image -> VGG16-> features -> softmax -> result

Dataset

The original paper trained the model on VOC2007, and VOC2012 [confirm it]. I started to train on VOC2012, but the evaluation metrics didn't seem good at the beginning. It was very poor for several reasons (will explain later the challenges faced). But after some time I realized from the confusion matrix (add confusion matrix) that the model was working well on only pictures of dogs and cats. So I decided to only work on them from all of the classes. for simplicity, later maybe I will increase the complexity.

steps

At first, I extracted all regions
I separated the images of backgrounds, dogs, and cats
then I relabeled them as 0 for the dog; 1 for the cat, and 2 for the background.

TODO

[x] check and revise

sezan92 commented 1 year ago

Update 2023/03/15

[x] command for preparing data

python3 /src/rcnn/prepare_data.py DATA --ss_method {fast,quality} --num_rects NUM_OF_RECTS --output 
OUTPUT_DIRECTORY --data_batch_size DATA_BATCH_SIZE --upper_iou_thresh UPPER_IOU_THRESHOLD --    lower_iou_thresh LOWER_IOU_THRESHOLD --minimum_bg_size MINIMUM_BACKGROUND_SIZE --split     {train/test/validation}

TODO

[x] command for training models
[ ] psuedocode for training models
[ ] think about introducing the commands in the relavant sections

sezan92 commented 1 year ago

Update 2023/03/27

Training model

[x] command for training model

python3 /src/train.py --train_dir TRAIN_DIR_PATH --valid_dir VALID_DIR_PATH--batch_size BATCH_SIZE --learning_rate LEARNING_RATE --output MODEL_TARGET_DIR --num_classes NUMBER_OF_CLASSES --bg_class BACKGROUND_CLASS_ID

The training model is simple as training a CNN model. We feed in the images per class .

TODO

[x] describe training model
[ ] describe datagen from directory

sezan92 commented 1 year ago

Update 2023/04/04

Training model

In the paper, they selected the Alexnet model, these days, there are far better models. I selected to use VGG16 as it is fairly easy to use. In addition to the model vgg16 model, I added some augmentation layers . They are random flip, random translation, random rotation, and random contrast. At the end of the model, I used a 4096-dimensional linear layer withreluactivation function and usedsoftmax` for classification.

Image -> Augmentation layers -> VGG16 model without classification layer -> flattening + dropout -> relu layer -> classification layer

[Add block diagram]

TODO

[x] Draw the block diagram for model

sezan92 commented 1 year ago

Update 2023/04/08

Block diagram

[x] https://drive.google.com/file/d/1N-mDVBdz9paAQ--t2linqGRecyhesfkk/view?usp=sharing

TODO

Need to recheck

sezan92 commented 1 year ago

Update 2023/04/12

Block diagram

VGG16RCNN drawio

sezan92 commented 1 year ago

Update 2023/04/12

Initial Result, challenges faced

After training on VOC2012, I got the results like https://github.com/sezan92/ComputerVision/issues/85#issuecomment-1328206603

If you check the confusion matrix properly, most objects were not classified correctly! There were biases for certain classes! This seemed problematic. So I tried to debug the issue. To make things easier, I chose only two classes, Cats and Dogs, with the background. From visual inspection (I cannot provide the stats should get it), many BG class images seemed to have weird sizes and shapes. For example, 10 x 100, 1 x 10 etc. But in the test case, that might not be the case.

So, i introduced a minimum image size (that is 128 x 128 ) it helped me get realistic images

Also, another problem seemed that due to one iou threshold, many background images having very similar iou (like 0.45) were selected as Background. to make sure background were really background

I introduced a lower iou threshold or upper iou threshold. If the ba

TODO

[x] revise and update the reason for two classes, introduction of lower iou threshold and upper iou threshold.

sezan92 commented 1 year ago

Update 2023/04/17

[x] revised and updated https://github.com/sezan92/sezan92.github.io/issues/27#issuecomment-1505295049

TODO

[x] check the comment and write the conclusion

sezan92 commented 1 year ago

Update 2023/04/19

revised the comments, looks good. need to make script to generate results and evaluation

TODO

[x] start working on https://github.com/sezan92/ComputerVision/issues/124

sezan92 commented 1 year ago

Update 2023/04/25

[x] https://github.com/sezan92/ComputerVision/issues/124#issuecomment-1521144776

sezan92 commented 1 year ago

Update 2023/04/27

[x] https://github.com/sezan92/ComputerVision/issues/124#issuecomment-1524784417

sezan92 / sezan92.github.io

RCNN blog #27

Objective

Tasks

Topics

Why started working on RCNN

Is it necessary to implement?

Why an old algorithm like RCNN?

What kind of skills it may show?

TODO

What was my expected outcome

What outcomes I earned

What have I learnt.

TODO

Update 2023/03/06

TODO

Update 2023/03/07

TODO 2023/03/07

Update 2023/03/08

TODO 2023/03/08

Update 2023/03/09

RCNN description

Dataset Preparation

TODO

Update 2023/03/10

TODO

Update 2023/03/13

Model

Dataset

TODO

Update 2023/03/15

TODO

Update 2023/03/27

Training model

TODO

Update 2023/04/04

Training model

TODO

Update 2023/04/08

Block diagram

TODO

Update 2023/04/12

Block diagram

Update 2023/04/12

Initial Result, challenges faced

TODO

Update 2023/04/17

TODO

Update 2023/04/19

TODO

Update 2023/04/25

Update 2023/04/27