Adding new MobileNet tutorial

Mruzik1 commented 1 year ago

Added new notebook on the MobileNetV2 SSD Lite training.

tersekmatija commented 1 year ago

I reviewed it offline already, can you review please @N950 ?

N950 commented 1 year ago

@Mruzik1 Sharing some remarks after running the example

change grouped_boxes to cuda for box_utils.assign_priors to have inputs on the same device
minor change to tqdm dataloader loop, add total arg for functioning progress bar
specify opset_version to torch export & version to blobconverter export
specify blobconverter params:
- shaves
- optimizer_params: mean_values scale_values reverse_input_channels
- you can specify an output_dir to avoid default .cach path
can't run the Run the blob on DepthAI section in a clean env, getting Third party libraries failed to import: No module named 'depthai_helpers.app_manager'.
- will review more tomorrow, in case it's due to some breaking changes in current main branch please let me know

Mruzik1 commented 1 year ago

Everything is changed in the code. Now checking the blob running problem. I'll let you know when fixed.

Mruzik1 commented 1 year ago

@N950 For now I just added git checkout [working commit id] to the tutorial. I tested, and with it everything should work now.

N950 commented 1 year ago

@Mruzik1 thank you I reran the example and it works now on device.

Training:
- train losses are logged per batch, should be per epoch
- train metrics are missing
- adding overlays of val bbox predictions to tensorboard would be nice, only a small percentage per epoch to keep the loop fast and the logging file size reasonable
- did you run some complete training sessions on a practical sized dataset to make sure all is good and we can reach good val mAP ?

Mruzik1 commented 1 year ago

Ok, I'll do it today.

train metrics are missing

Should I add only the precision and recall? Or maybe some other?

did you run some complete training sessions on a practical sized dataset to make sure all is good and we can reach good val mAP ?

No, just on the validation coco subset for a few epochs. But I can use the full dataset and see how it goes

N950 commented 1 year ago

Should I add only the precision and recall? Or maybe some other?

Yeah sure maybe iou also, hopefully to make it that whatever standard detection metric the user is looking for it's already there

No, just on the validation coco subset for a few epochs. But I can use the full dataset and see how it goes

No need to use the full train split, let's go with 40k for training

Mruzik1 commented 1 year ago

Sorry for the delay. For now I checked training on 30% (~35k) of data from the training COCO subset. The mAP is growing little by little, but I belive with a bigger set and fewer classes it will do better. Later I will modify the training loop and let you know when done.

Mruzik1 commented 1 year ago

@N950 Everything is done I think

N950 commented 1 year ago

@Mruzik1 Thanks for the updates

add saving weights
- currently you're using the latest weights to export
- add both best/latest.pth saving
logs are still lacking, only losses are logged for training and only mAP is logged for validation
We still need at least one practical train run to validate our tutorial
- you can pick as much as you think is needed from COCO, limiting number of classes is ok
- show that reaching a good mAP is possible using our tutorial

Mruzik1 commented 1 year ago

I decided to change the dataset to potentially reach higher mAP. So now it uses VOC2012 validation set for training with only 3 classes (e.g. person, vehicle, animal) and consists of 5k samples. So I trained the net for a few epochs, and mAP is now noticeably higher. Although still not so good, mAP@50 is just 0.022 after 20 epochs. Fortunately it didn't take much time to train due to a small number of samples. Maybe it's fine as just a demo?

I also now write all displaying metrics to tensorboard. Still need to add weights saving. I'll make some commits and ping you when done.

P.S. I have a few ideas how to improve mAP, but don't have much time to do it, since I will be on vacation starting from the next week (4.09). Probably the mAP is low because instances in the dataset are not similar enough (the vehicle can be either a plane or a train, etc). So it needs more epochs and more samples to train properly.

Mruzik1 commented 1 year ago

@N950 Done

luxonis / depthai-ml-training

Adding new MobileNet tutorial #58