Open Xento opened 3 years ago
I suppose you'd want to quantise it and run it on an accelerator such as the Coral?
I so, you need to be aware of the network degradation from quantising from full precision 32bit -> 8bit in case of the Coral. I have no idea how the models will perform, and by seeing how the NNs struggle with full precision, I am not quite sure how beneficial this might be.
Further, you'd have too be on lookout if the tf lite converter supports all operations implemented in the existing NNs. For example, GlobalMaxPooling used in the MobileNet-based NNs of this project are not supported (last checked October 2020) and one would have to retrain the models without this operation. As I unfortunately did not design the models with quantisation in mind. But it would be very interesting to know!
TLDR: Technically yes. But consider performance degradation due to the quantisation and one would need to ensure the convertibility of the utilised NN operations.
I thought it would be a bit faster I'm just experimenting with this project as my cats often bring things in an the cat flap is already smart. Maybe I could reduce the detection steps as I actually placed the camera right in the tunnel to the cat flap, so it always records the face. I would build a trigger with a pir or something like it, so I only have to start the recognition when a cat is in the tunnel.
To my knowledge...the faster execution depends on quite a few factors. In terms of memory efficiency. Yes, tflite will be much better. But inference itself, depends on how the tflite runtime actually optimises the execution and if your processor actually supports the instructions of the optimised execution. So imo, tflite might actually benefit you in terms of memory and slightly in inference time, but if you already perform all these steps, you might as well just run it on the Coral which will boost your inference time by a large magnitude!
With the tunnel system, you can obviously reduce the model complexity drastically compared to my solution. I tested the pir approach as well, but found it to have a lot of false positives and switched completely to vision. However in the tunnel, things might work significantly better because it is a more controlled environment.
So if you'd ask me for advice: Use your tunnel approach, check if the pir option does not generate too many false positives. Use a simple binary classifier with a MobileNetV2 architecture and perform binary classification: Prey/No_Prey. Make sure you only use tflite compatible operations and check if model performance degradation is not too bad on 8bit quantisation. Then offload the inference to the Coral accelerator and you'll have a blazing fast system. In a different project I achieved ~100fps (on an i5 NUC though). So I'd expect the rpi to achieve ~50fps.
Sounds cool, keep me updated :)
I'm new to buildung ML things. So maybe I will stick to your solution and put a camera to monitor the way to the flap. Than there would be more time to check the image.
Are the other parts of your work (classify the images/learn the models etc.) public, too?
Welcome to the ML world :) Your tunnel approach would be easier to train, if you'd want to build an own model.
The training procedure of the individual models is very standard and I didn't upload it. I'd suggest you have a look at: https://www.tensorflow.org/guide/keras/transfer_learning and transfer learn on my models :)
I have been experimenting with different cameras. It is very difficult to get a clear image which can be analyzed.
Today I installed a raspberry pi zero with an noir cam and mjpeg streamer wich streams to an raspberry pi with tf lite and ssd_mobilenet_v3_small_coco_2020_01_14. With tf lite I can get about 8fps on an rpi4 on 2ghz. The cat/dog detection works quite well.
Can you provide the images and the code you used to train the prey detection?
The training code is quite messy and requires adjustments by hand. But if you really want, I can provide it in my drive if you send me your mail.
That would be greate. My email address is .....
Thanks for sharing. It would be glad to know how you trained the networks. Especially the prey detection. I think maybe i could only use this detection or in combination with the face detection.
Now with the NoIR Raspi W Zero cam I can get better images like this. Mostly our black cat brings prey inside, sometimes up to 3 times a night :-(
I tried my first trainings with the code from https://keras.io/examples/vision/image_classification_from_scratch/ but instead Cat/Dog I used Clean/Prey. I have to try it with the new images.
Wow your cat is a true data gold mine! I suppose with this setup i.e. camera perspective you should be able to soley classify the images as prey/clean just like you proposed.
I sent you the script by mail. But I would not set on it as a gold standard...
I took the images and added some of mine and let it train. I modified your script to automatically decide which images are for training/validating/testing. I sst it to 10% of the Training data. When I increase this it says there is not enough data for Training. It reached about 94% accuracy but I didnt had time to change the Script to load the Model and test my prey images.
If I remember correctly the training script needs validation and training data to be exact multiples of the batchsize...i.e default batch size is 32 so if you have 66 images you have to delete 2 to arrive at 2*32... It sucks I know...
Sounds promising! But remember that accuracy is a deceiving metric in an imbalanced set...I'd try to rely on f1 or precision & recall etc. :)
Did you directly train the prey classiefyer with the full images or did you some cropping before? I modified your cascade to output the catface and snout images to different folders. Sometimes the snout model crops to much of the image, so maybe it better to use the catface images for training. Actually I have about 130 images for each clean and prey.
This is what I have changed to generate the data. I added rescaling the images before training as the later images are rescaled, too.
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=40,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.2,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='nearest',
rescale=1. / 255,
validation_split=0.1)
# only rescaling
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1. / 255,
validation_split=0.2)
# this is a generator that will read pictures found in
# and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
train_dir, # this is the target directory
shuffle=True,
#save_to_dir=augument_dir,
#save_prefix='hoi',
target_size=(TARGET_SIZE, TARGET_SIZE),
batch_size=BATCH_SIZE,
class_mode='binary',
subset="training") # since we use binary_crossentropy loss, we need binary labels
# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
train_dir,
target_size=(TARGET_SIZE, TARGET_SIZE),
batch_size=BATCH_SIZE,
class_mode='binary',
subset="validation")
I believe that I tried both... might be stated somewhere in my report. But if I recall correctly, it did not make such a difference. However training the model imbalanced i.e more clean images with a class weight improved precision by a bit.
And yes rescaling makes sense!
I uploaded my images ;-) I'm still experimenting how to take the best pictures. At the moment I have much false positiv cat detections, so I have to delete lots of images. I'm thinking about using a ultrasonic sensor with my raspberry pi zero, so It could activate the stream when something comes near.
How to handle false positivs? Does it help when i duplicate the false positiv images in the image set before learning?
Because I have not much time at the moment to care about the images I changed your learning script to automatically split the images for training and validation. Would it increase the accuracy when I would have to different directories for training and validation? What images should be in the validation folder and which in the training folder? Is it ok when that would be the same?
Sure, the sensor would reduce some workload. I tried that in the beginning as well, but I experienced a lot of FPs with the sensor as well, so I went for a purely image based solution.
Regarding the how to handle FPs:
Regarding train/validation split:
Cheers
Hi @niciBume,
is it still possible to retrieve the training data from you? I'd like to train my own model for a RK3588 based Board which has a builtin NPU, as an alternative to the Coral Accelerator mentioned above. My camera setup is pretty similar to the one used by @Xento, the camera is mounted in the tunnel, so I get clear pictures including IR for nighttime.
Hey @martin31821 yeah if you post your gmail address I can add you to the google drive.
Hey @martin31821 yeah if you post your gmail address I can add you to the google drive.
--- is my google account.
Would it be possible to modiy it for Tensorflow Lite? This should be much faster on a RPi