niciBume / Cat_Prey_Analyzer

Cat Prey Image-Classification with deeplearning
MIT License
142 stars 22 forks source link

Modify for Tensorflow Lite #18

Open Xento opened 3 years ago

Xento commented 3 years ago

Would it be possible to modiy it for Tensorflow Lite? This should be much faster on a RPi

niciBume commented 3 years ago

I suppose you'd want to quantise it and run it on an accelerator such as the Coral?

I so, you need to be aware of the network degradation from quantising from full precision 32bit -> 8bit in case of the Coral. I have no idea how the models will perform, and by seeing how the NNs struggle with full precision, I am not quite sure how beneficial this might be.

Further, you'd have too be on lookout if the tf lite converter supports all operations implemented in the existing NNs. For example, GlobalMaxPooling used in the MobileNet-based NNs of this project are not supported (last checked October 2020) and one would have to retrain the models without this operation. As I unfortunately did not design the models with quantisation in mind. But it would be very interesting to know!

TLDR: Technically yes. But consider performance degradation due to the quantisation and one would need to ensure the convertibility of the utilised NN operations.

Xento commented 3 years ago

I thought it would be a bit faster I'm just experimenting with this project as my cats often bring things in an the cat flap is already smart. Maybe I could reduce the detection steps as I actually placed the camera right in the tunnel to the cat flap, so it always records the face. I would build a trigger with a pir or something like it, so I only have to start the recognition when a cat is in the tunnel.

niciBume commented 3 years ago

To my knowledge...the faster execution depends on quite a few factors. In terms of memory efficiency. Yes, tflite will be much better. But inference itself, depends on how the tflite runtime actually optimises the execution and if your processor actually supports the instructions of the optimised execution. So imo, tflite might actually benefit you in terms of memory and slightly in inference time, but if you already perform all these steps, you might as well just run it on the Coral which will boost your inference time by a large magnitude!

With the tunnel system, you can obviously reduce the model complexity drastically compared to my solution. I tested the pir approach as well, but found it to have a lot of false positives and switched completely to vision. However in the tunnel, things might work significantly better because it is a more controlled environment.

So if you'd ask me for advice: Use your tunnel approach, check if the pir option does not generate too many false positives. Use a simple binary classifier with a MobileNetV2 architecture and perform binary classification: Prey/No_Prey. Make sure you only use tflite compatible operations and check if model performance degradation is not too bad on 8bit quantisation. Then offload the inference to the Coral accelerator and you'll have a blazing fast system. In a different project I achieved ~100fps (on an i5 NUC though). So I'd expect the rpi to achieve ~50fps.

Sounds cool, keep me updated :)

Xento commented 3 years ago

I'm new to buildung ML things. So maybe I will stick to your solution and put a camera to monitor the way to the flap. Than there would be more time to check the image.

Are the other parts of your work (classify the images/learn the models etc.) public, too?

niciBume commented 3 years ago

Welcome to the ML world :) Your tunnel approach would be easier to train, if you'd want to build an own model.

The training procedure of the individual models is very standard and I didn't upload it. I'd suggest you have a look at: https://www.tensorflow.org/guide/keras/transfer_learning and transfer learn on my models :)

Xento commented 3 years ago

I have been experimenting with different cameras. It is very difficult to get a clear image which can be analyzed.

Today I installed a raspberry pi zero with an noir cam and mjpeg streamer wich streams to an raspberry pi with tf lite and ssd_mobilenet_v3_small_coco_2020_01_14. With tf lite I can get about 8fps on an rpi4 on 2ghz. The cat/dog detection works quite well.

Can you provide the images and the code you used to train the prey detection?

niciBume commented 3 years ago

The training code is quite messy and requires adjustments by hand. But if you really want, I can provide it in my drive if you send me your mail.

Xento commented 3 years ago

That would be greate. My email address is .....

Xento commented 3 years ago

Thanks for sharing. It would be glad to know how you trained the networks. Especially the prey detection. I think maybe i could only use this detection or in combination with the face detection.

Now with the NoIR Raspi W Zero cam I can get better images like this. Mostly our black cat brings prey inside, sometimes up to 3 times a night :-(

I tried my first trainings with the code from https://keras.io/examples/vision/image_classification_from_scratch/ but instead Cat/Dog I used Clean/Prey. I have to try it with the new images.

2021_08_13_00-02-27-708255 2021_08_23_06-20-13-223815 2021_08_23_02-13-42-569505 2021_08_23_06-46-04-492283

niciBume commented 3 years ago

Wow your cat is a true data gold mine! I suppose with this setup i.e. camera perspective you should be able to soley classify the images as prey/clean just like you proposed.

I sent you the script by mail. But I would not set on it as a gold standard...

Xento commented 3 years ago

I took the images and added some of mine and let it train. I modified your script to automatically decide which images are for training/validating/testing. I sst it to 10% of the Training data. When I increase this it says there is not enough data for Training. It reached about 94% accuracy but I didnt had time to change the Script to load the Model and test my prey images.

niciBume commented 3 years ago

If I remember correctly the training script needs validation and training data to be exact multiples of the batchsize...i.e default batch size is 32 so if you have 66 images you have to delete 2 to arrive at 2*32... It sucks I know...

Sounds promising! But remember that accuracy is a deceiving metric in an imbalanced set...I'd try to rely on f1 or precision & recall etc. :)

Xento commented 3 years ago

Did you directly train the prey classiefyer with the full images or did you some cropping before? I modified your cascade to output the catface and snout images to different folders. Sometimes the snout model crops to much of the image, so maybe it better to use the catface images for training. Actually I have about 130 images for each clean and prey.

This is what I have changed to generate the data. I added rescaling the images before training as the later images are rescaled, too.

train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.1,
        height_shift_range=0.1,
        shear_range=0.2,
        zoom_range=0.1,
        horizontal_flip=True,
        fill_mode='nearest',
        rescale=1. / 255,
        validation_split=0.1)

    # only rescaling
    test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
      rescale=1. / 255,
      validation_split=0.2)

    # this is a generator that will read pictures found in
    # and indefinitely generate
    # batches of augmented image data
    train_generator = train_datagen.flow_from_directory(
        train_dir,  # this is the target directory
        shuffle=True,
        #save_to_dir=augument_dir,
        #save_prefix='hoi',
        target_size=(TARGET_SIZE, TARGET_SIZE),
        batch_size=BATCH_SIZE,
        class_mode='binary',
        subset="training")  # since we use binary_crossentropy loss, we need binary labels

    # this is a similar generator, for validation data
    validation_generator = test_datagen.flow_from_directory(
        train_dir,
        target_size=(TARGET_SIZE, TARGET_SIZE),
        batch_size=BATCH_SIZE,
        class_mode='binary',
        subset="validation")
niciBume commented 3 years ago

I believe that I tried both... might be stated somewhere in my report. But if I recall correctly, it did not make such a difference. However training the model imbalanced i.e more clean images with a class weight improved precision by a bit.

And yes rescaling makes sense!

Xento commented 3 years ago

I uploaded my images ;-) I'm still experimenting how to take the best pictures. At the moment I have much false positiv cat detections, so I have to delete lots of images. I'm thinking about using a ultrasonic sensor with my raspberry pi zero, so It could activate the stream when something comes near.

niciBume commented 3 years ago

Sure, the sensor would reduce some workload. I tried that in the beginning as well, but I experienced a lot of FPs with the sensor as well, so I went for a purely image based solution.

Regarding the how to handle FPs:

Regarding train/validation split:

Cheers

martin31821 commented 2 months ago

Hi @niciBume,

is it still possible to retrieve the training data from you? I'd like to train my own model for a RK3588 based Board which has a builtin NPU, as an alternative to the Coral Accelerator mentioned above. My camera setup is pretty similar to the one used by @Xento, the camera is mounted in the tunnel, so I get clear pictures including IR for nighttime.

niciBume commented 2 months ago

Hey @martin31821 yeah if you post your gmail address I can add you to the google drive.

martin31821 commented 2 months ago

Hey @martin31821 yeah if you post your gmail address I can add you to the google drive.

--- is my google account.