now working in tfjs node and browser with few questions

notAI-tech / NudeNet

Lightweight nudity detection

https://nudenet.notai.tech/

GNU Affero General Public License v3.0

1.74k stars 339 forks source link

now working in tfjs node and browser with few questions #54

Closed vladmandic closed 3 years ago

vladmandic commented 3 years ago

fyi with few quick questions:

i've downloaded checkpoint as noted in the detector.py and converted it to tfjs graph_format using
tensorflowjs_converter --strip_debug_ops=* --control_flow_v2=* --quantize_float16=* saved/ f16/
(quantized to float16 to reduce size by half)

model works in tfjs in nodejs and browser using webgl like a charm using tfjs 2.6.0!

few comments:

checkpoint is the training version and references python variables used in model definition
any chance you can also do a compiled version?
it should significantly help with size and speed
i can probably do it as well, but i'd think you'd want to release compiled version for usage and only use dev version for training
model is very picky about input image resolution
any thoughts on that? seems like i get best results if i resize image before inference to a range around 800-1000px
anything smaller than 700px and it misses things badly and anything bigger than 1100px gets a lot of false positives
performance is pretty low compared to any other object detection model out there by 2-5x? any thoughts?
model is very memory hungry - it can easily eat up 2gb of gpu memory to process an image with 1k resolution
which unfortunately quickly leads to out-of-memory situations
due to general bad behavior of browser garbage collection of webgl objects

this is by far the most advanced nsfw model i've seen - if it weren't for few issues (performance, memory, resolution sensitivity), it would be perfect!

bedapudi6788 commented 3 years ago

Hi Vlad, sorry for the late reply.

model works in tfjs in nodejs and browser using webgl like a charm using tfjs 2.6.0!

This is awesome!

checkpoint is the training version and references python variables used in model definition

The default checkpoint and tensorflow saved models (provided in the latest versions) are the inference checkpoints. If possible, can you expand on this.

any chance you can also do a compiled version?

Can you explain a little bit on what is a compiled version (or any links/docs). I might be able to help.

model is very picky about input image resolution performance is pretty low compared to any other object detection model out there by 2-5x? any thoughts? model is very memory hungry - it can easily eat up 2gb of gpu memory to process an image with 1k resolution

The provided model is trained on images with min_side=800 and max_side=1333 (side can be width or height). Since my focus was more on generating the data part, I used the defaults of keras-retinanet.

Training a much smaller and faster object detector (yolov4 tiny, ultralytics yolov5 small) on this data is on my to do list (may be in 5-6 weeks depending on my other responsibilities).

vladmandic commented 3 years ago

thanks for the comments

Can you explain a little bit on what is a compiled version (or any links/docs). I might be able to help.

since you're loading the model using tf.contrib.predictor, i assume it's created using estimator class?
and i have zero experience using estimator as it's before my time (entire contrib namespace is obsolete in tensorflow v2 and i've only been using tf for the past few months).
but from what i see, your saved model is just definitions with all the trained data in checkpoint, stored in variables (inside variables/variables.data-00000-of-00001).

goal is to get to a static saved_model.pb as a single file that contains all pretrained weights as constants. no clue how.
maybe this? https://www.tensorflow.org/api_docs/python/tf/saved_model/load has a chapter on estimators.

again, i'm just guessing since i've never worked with estimator or predictor,
all i know is that resulting checkpoints contains variable references - that is good for training, but less than ideal for running the inference in production

The provided model is trained on images with min_side=800 and max_side=1333 (side can be width or height).

that explains my findings :)

Training a much smaller and faster object detector (yolov4 tiny, ultralytics yolov5 small) on this data is on my to do list (may be in 5-6 weeks depending on my other responsibilities).

nice!

perhaps you'd want to take a look at CenterNet?
It's not as small as YoloV4-Tiny, but its damn fast (by far fastest of all non-trivial models) and very flexible
it's becoming my go-to for any kind of object detection tasks

btw, i've created a simple gist that uses tfjs-node to showcase nudenet model (both saved_model and graph_model as well as quick bluring of nude parts)
https://gist.github.com/vladmandic/f79c80f83a35d01d9e2df072cf426254

bedapudi6788 commented 3 years ago

since you're loading the model using tf.contrib.predictor, i assume it's created using estimator class?

The saved model was created from the keras checkpoint at https://github.com/notAI-tech/NudeNet/releases/download/v0/detector_v2_default_checkpoint

Although I haven't tried it out, https://github.com/faustomorales/retinanetjs this repo shows how to convert the checkpoint to tfjs format.

You might also be able to export the keras checkpoint to single saved_model.pb.

perhaps you'd want to take a look at CenterNet?

This is interesting. There also seems to be a CenterNet implementation that will work with my existing training scripts with minimal changes (https://github.com/xuannianz/keras-CenterNet. Are there any other implementations of CenterNet you recommend?

vladmandic commented 3 years ago

The saved model was created from the keras checkpoint at https://github.com/notAI-tech/NudeNet/releases/download/v0/detector_v2_default_checkpoint Although I haven't tried it out, https://github.com/faustomorales/retinanetjs this repo shows how to convert the checkpoint to tfjs format.

that procedure creates layers model with fixed size - good for classification models, not so good for detection models

maybe section "converting a training model to inference model" from https://github.com/fizyr/keras-retinanet can be used?
that script works with keras_model format (h5), but should be ok to switch to saved_model format (pb)

This is interesting. There also seems to be a CenterNet implementation that will work with my existing training scripts with minimal changes https://github.com/xuannianz/keras-CenterNet.
Are there any other implementations of CenterNet you recommend?

i've been using tensorflow ported version https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
specifically centernet on resnet50v2 backbone - it's almost as good as resnet101v1 backbone, but smaller and faster
key difference is that it's tpu optimized, althoguh there is still one compatibility issue https://github.com/tensorflow/tfjs/issues/4133
and it requires tfjs 2.6.0 due to variable shape matmul implementation not present in earlier version

bedapudi6788 commented 3 years ago

maybe section "converting a training model to inference model" from https://github.com/fizyr/keras-retinanet can be used?

https://github.com/notAI-tech/NudeNet/releases/download/v0/detector_v2_default_checkpoint is exported to the inference format.

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md

Thanks! I will use this.

vladmandic commented 3 years ago

hmm, i don't understand, i'll dig more.

what i'm talking about is the output of converter when converting your saved_model to tfjs graph_model lists this:

WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'Variable:0' shape=(9, 4) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'Variable_1:0' shape=(9, 4) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'Variable_2:0' shape=(9, 4) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'Variable_3:0' shape=(9, 4) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'Variable_4:0' shape=(9, 4) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'Variable:0' shape=(9, 4) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'Variable_1:0' shape=(9, 4) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'Variable_2:0' shape=(9, 4) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().
...

(ignore the incorrect variable names - it's an open issue with converter that it mangles them as well as node names)

vladmandic commented 3 years ago

re: training vs inference model - it might be as simple as running freeze before saving the model?

this is useful: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md and https://towardsdatascience.com/freezing-a-keras-model-c2e26cb84a38