Explainable output model

simonfreeman80 commented 4 years ago

Hi @vulcan25, I'm looking for a possible methods to integrate an explicable output for Keras-YOLO model, so basically alongside boundary boxes, the "activation" areas witch fires the CNN. I was looking this: https://github.com/jacobgil/keras-grad-cam

Just want to share with you, I'll start do some experiment in the weekend... Take care Thx

vulcan25 commented 4 years ago

Reader of the future, skip this post, as it's a waste of time

Alright, I've had a quick go at making this work in an independent container, before I decide how to plug this into this repo.

This Dockerfile should work:

FROM python:stretch
RUN apt-get update && apt-get install -y libgtk2.0-dev
RUN pip install -U pip && pip install keras tensorflow opencv-python Pillow
WORKDIR /code

Build this with something like:

docker build -t grad-light .

Then on the host system, clone that repo:

git clone https://github.com/jacobgil/keras-grad-cam
cd keras-grad-cam

Next run it with:

docker run -v`pwd`/:/code -it grad-light /bin/bash

Then inside the image run:

python grad-cam.py examples/cat_dog.png

On the first run, this should start downloading a whole load of models to the /root/.keras folder.

In another terminal, get the container ID with docker ps, in my case vigilant_black, and copy this heavy folder out of the container:

docker cp vigilant_black:/root/.keras ./keras

I moved this into the same folder as the Dockerfile, and just bundled these into the image, by appending this line to the Dockerfile:

COPY ./keras /root/.keras

then rebuild it with the same build command as above.

Re-run the container with the modfied image, and executing the script:

docker run -v`pwd`/:/code -it grad-light python /code/grad-cam.py /code/examples/cat_dog.png

I moved from a t2.medium to a t2.large instance with 8GB instead of 4GB RAM as I was seeing things like:

2020-01-30 14:29:36.247970: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 411041792 exceeds 10% of system memory.
[snip]
ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.

However, even after upgrading the memory, the command still gives:

Using TensorFlow backend.
2020-01-30 14:39:25.070274: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-01-30 14:39:25.070393: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-01-30 14:39:25.070531: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-30 14:39:25.789400: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-01-30 14:39:25.789452: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
2020-01-30 14:39:25.789489: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (660430b6e788): /proc/driver/nvidia/version does not exist
2020-01-30 14:39:25.789698: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-30 14:39:25.796517: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300030000 Hz
2020-01-30 14:39:25.796854: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x562ecc67b200 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-30 14:39:25.796888: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Predicted class:
boxer (n02108089) with probability 0.42
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 9946, in square
    tld.op_callbacks, x)
tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "grad-cam.py", line 135, in <module>
    cam, heatmap = grad_cam(model, preprocessed_input, predicted_class, "block5_conv3")
  File "grad-cam.py", line 99, in grad_cam
    grads = normalize(K.gradients(loss, conv_output)[0])
  File "grad-cam.py", line 22, in normalize
    return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
  File "/usr/local/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 1888, in square
    return tf.square(x)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 9951, in square
    x, name=name, ctx=_ctx)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 9985, in square_eager_fallback
    _attr_T, (x,) = _execute.args_to_matching_eager([x], ctx)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 263, in args_to_matching_eager
    t, dtype, preferred_dtype=default_dtype, ctx=ctx))
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 317, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
    allow_broadcast=True)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 266, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.

That's all for now....

vulcan25 commented 4 years ago

This repo claims to fix this: https://github.com/PowerOfCreation/keras-grad-cam

However, I now see:

 docker run -v`pwd`/:/code -it grad-light /bin/bash
root@dbc0260362f4:/code# python grad-cam.py 
.git/        LICENSE      README.md    examples/    grad-cam.py  
root@dbc0260362f4:/code# python grad-cam.py examples/cat_dog.png 
Using TensorFlow backend.
2020-01-30 15:10:45.866169: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-01-30 15:10:45.866292: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-01-30 15:10:45.866317: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-30 15:10:46.592240: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-01-30 15:10:46.592289: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
2020-01-30 15:10:46.592319: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (dbc0260362f4): /proc/driver/nvidia/version does not exist
2020-01-30 15:10:46.592524: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-30 15:10:46.599601: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300055000 Hz
2020-01-30 15:10:46.599917: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c9e55ee120 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-30 15:10:46.599945: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Predicted class:
boxer (n02108089) with probability 0.42
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
_________________________________________________________________
lambda_1 (Lambda)            (None, 1000)              0         
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________
Traceback (most recent call last):
  File "grad-cam.py", line 137, in <module>
    cam, heatmap = grad_cam(model, preprocessed_input, predicted_class, "block5_conv3")
  File "grad-cam.py", line 101, in grad_cam
    grads = normalize(_compute_gradients(loss, [conv_output])[0])
  File "grad-cam.py", line 90, in _compute_gradients
    grads = tf.gradients(tensor, var_list)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/gradients_impl.py", line 274, in gradients_v2
    unconnected_gradients)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/gradients_util.py", line 491, in _GradientsHelper
    raise RuntimeError("tf.gradients is not supported when eager execution "
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.

simonfreeman80 commented 4 years ago

hi, did you try to add at the begin: tf.compat.v1.disable_eager_execution()

vulcan25 commented 4 years ago

I'll try this shortly and update the thread.

In the meantime can you explain to me what the overall usage of this part would be like.

With the current (YOLOv3) conversion we do something like:

Input: Image
Output: text list of objects, text list of bounding boxes, and modified image with bounding boxes overlayed.

I'm trying to figure out what this new method does exactly. Take this example from the README:

few

Would the equivelant conversion for this be:

Input: Left most image, and string "boxer"
Output: Centre, and right image.

Or am I misunderstanding?

What is this actually supposed to do, strictly from a user's perspective: What are you feeding the API and what do you want back?

simonfreeman80 commented 4 years ago

Hi @vulcan25 , the idea behind is to have another API method, user can call to collect the output of the explicable keras model, mostly for debugging to have an insight about model performance. So imagine you have trained a custom module with few class, and 2 class are close as similarity (car model, watches, etc), if you can cross check what are areas of image's detail the model is looking for, you can try to better train model itself (tailor boxins labelling into specific areas, augument training).

As usage perspective, it could be desiderable to have a separate endpoint method to call, witch will respond including the computed heatmapped images stored into Redis DB.

Example:

First call with main API, getting response {'id': '/view/0e595889-1aa5-44b3-a8ee-27efedaed82e', 'info': {'object_string': 'handbag:1, sports ball:1, person:4, ', 'objects': ['sports ball', 'handbag', 'person', 'person', 'person', 'person'], 'scored_objects': [{'object': 'sports ball', 'score': '0.57',heatmap': '/explain/27efedaed82e'}, {'object': 'handbag', 'score': '0.30',heatmap': '/explain/37efedaed82e'}, {'object': 'person', 'score': '0.99',heatmap': '/explain/47efedaed82e'}, {'object': 'person', 'score': '0.99',heatmap': '/explain/57efedaed82e'}, {'object': 'person', 'score': '1.00',heatmap': '/explain/67efedaed82e'}, {'object': 'person', 'score': '1.00',heatmap': '/explain/77efedaed82e'}], 'success': True}}

Thx

vulcan25 commented 4 years ago

{'object': 'handbag', 'score': '0.30',heatmap': '/explain/37efedaed82e'},

Okay, and in that case /explain/37efedaed82e is just another endpoint with an image download?

I'm really struggling to get this running in docker though, even just the bare example:

python grad-cam.py examples/cat_dog.png

Have you had any sucess making this work in Docker?

simonfreeman80 commented 4 years ago

/explain/37efedaed82e Exactly will be another image to download. I will test on Docker and let you know ASAP during this weekend.

simonfreeman80 commented 4 years ago

@vulcan25 Think this can help https://github.com/ryoasu/grad-cam?files=1 there’s a Docker file as well

vulcan25 commented 4 years ago

This was the one ! I've got that running, and converting images.

Give me a few days to tie this in to our repo. I plan to do some slight re-strucutring first.

vulcan25 commented 4 years ago

@simonfreeman80 See this commit for the 're-structuring' i mentioned: https://github.com/vulcan25/image_processor/commit/b2396811c006bf449287de91a77ddba2bb318459

Probably best familiarise yourself with that (although it shouldn't make a difference to you really) any queries on that, please log a separate issue.

grad-cam integration is looking good so stay tuned for a further update.

simonfreeman80 commented 4 years ago

HI @vulcan25, unable to use Docker, but I'm trying to run on my Anaconda local env (CPU, no CUDA) I'm getting this warning: W tensorflow/core/framework/allocator.cc:124] Allocation of 411041792 exceeds 10% of system memory. Heatmap is created, but I don't have any predicted class output.

I'm still trying to figure it out how to use our original model YOLOv3) that's the scope, cause there's no point to use the vgg16 one.

I'll try to port the code in the simple image detect code. Keep u posted

vulcan25 commented 4 years ago

Lots of the stuff in that repo seemed to relate to config loading, so I forked it and made some changes: https://github.com/vulcan25/grad-cam

You should be able to install with:

pip install https://github.com/vulcan25/grad-cam/archive/v0.0.2.tar.gz

Or clone it, then from a python interpreter in the grad-cam folder:

from grad_cam import keras_grad_cam

See the bottom of grad_cam/gy.py for the example usage:

    with open('hydrant.jpg', 'rb') as f:
        res = keras_grad_cam(f)

Save the generated images with:

    for d in res:
        with open(os.path.join(d['model_name'] + '-' + d['layer'] +  '.png'),'wb') as f:
            f.write(d['file'])

Notice at the top of this file I set some parms:

PARAMS = './example/model/vgg16_weights_tf_dim_ordering_tf_kernels.h5'
MODEL_SOURCE_PATH='./example/src/vgg16.py'
MODEL_SOURCE_DEFINITION='vgg16'
LAYERS = ['block5_conv3','block4_conv3']
ARGS = [
        [224,224], # image size
        3, # channel
        1000 # classes
       ]

I'm not sure if you could swap these with the path to your weights.h5. I'm not sure what the other variables would have to be in that case, or what examples/src/vgg16.py would/should contain in your case. This is where my understanding stops a bit

Perhaps this usage and both of those functions are relevant. Is that what's referred to as the model definition?

However here's the info I get back from the return of the keras_grad_cam function, as well as the two generate heatmap images:

The heatmap value for each like:

>>> res[0]['heatmap']
array([[0.        , 0.        , 0.        , ..., 0.21066986, 0.21066986,
        0.21066986],
       [0.        , 0.        , 0.        , ..., 0.21066986, 0.21066986,
        0.21066986],
       [0.        , 0.        , 0.        , ..., 0.21066986, 0.21066986,
        0.21066986],
       ...,
       [0.17665634, 0.17665634, 0.17665634, ..., 0.4368378 , 0.4368378 ,
        0.4368378 ],
       [0.17665634, 0.17665634, 0.17665634, ..., 0.4368378 , 0.4368378 ,
        0.4368378 ],
       [0.17665634, 0.17665634, 0.17665634, ..., 0.4368378 , 0.4368378 ,
        0.4368378 ]], dtype=float32)

The cam value for each like:

res[0]['cam']
array([[[ 97,  39,  47],
        [ 95,  33,  32],
        [ 97,  35,  32],
        ...,
        [187,  82,  30],
        [236, 142,  82],
        [230, 114,  64]],

       [[102,  43,  49],
        [ 91,  29,  28],
        [102,  41,  38],
        ...,
        [163,  71,  20],
        [235, 141,  82],
        [232, 116,  65]],

       [[ 96,  35,  40],
        [ 94,  32,  31],
        [ 94,  32,  29],
        ...,
        [157,  71,  26],
        [239, 145,  87],
        [232, 116,  65]],

       ...,

       [[158,  70,  55],
        [171,  82,  67],
        [155,  59,  46],
        ...,
        [112, 145,  46],
        [129, 156,  62],
        [156, 184,  87]],

       [[169,  80,  65],
        [161,  69,  55],
        [155,  61,  48],
        ...,
        [140, 183,  92],
        [124, 161,  72],
        [144, 182,  92]],

       [[166,  75,  61],
        [164,  71,  57],
        [159,  68,  54],
        ...,
        [139, 178,  96],
        [137, 180,  95],
        [135, 177,  94]]], dtype=uint8)

I'm also seeing this output in the terminal, so I'm not sure if I need to capture that and return it also:

Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________
image class: 576
+==========================+
+==========================+

W tensorflow/core/framework/allocator.cc:124] Allocation of 411041792 exceeds 10% of system memory.

I had this problem, so moved from a 4GB RAM system to 8GB. What spec is your machine?

vulcan25 commented 4 years ago

I'm still trying to figure it out how to use our original model YOLOv3) that's the scope, cause there's no point to use the vgg16 one.

I'm really stuck on this part. Hopefully you can make some discovery here. See this commit on the load-yolo3-model branch which was a rough attempt: https://github.com/vulcan25/grad-cam/commit/448a4d0d30c137997519cc15f72af6850865ad5c

This has lead me to errors like:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 52, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
AttributeError: 'list' object has no attribute 'argmax'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/grad-cam/grad_cam/gc.py", line 73, in keras_grad_cam
    results, model_name = __keras_grad_cam(input_image)
  File "/grad-cam/grad_cam/gc.py", line 62, in __keras_grad_cam
    k_util.show_predicted_class(model, [input_image], image_to_arr)
  File "/grad-cam/keras_pkg/util.py", line 58, in show_predicted_class
    np.argmax(predicted_class))
  File "/usr/local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1004, in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out)
  File "/usr/local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 62, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/usr/local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 42, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
  File "/usr/local/lib/python3.7/site-packages/numpy/core/numeric.py", line 492, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not broadcast input array from shape (20,50,255) into shape (1)

I'm starting to think this first model isn't compatible with the grad cam stuff, but maybe I'm not understanding how the keras stuff works fully. Hopefully the info I've put here gets you closer to a solution. I may need to invest some time in other projects, but will keep an eye on this for any developments. If you can make the grad cam script work with your model, I'd be more than happy to tie this into the main repo . Good luck!

simonfreeman80 commented 4 years ago

Hi @vulcan25 , thx so much for your help and effort: I'll surely dive in to run some exaustive tests about GradCam on Yolov3. Apparently seems not to be that easy! Keep u posted. Take care

vulcan25 / image_processor

Explainable output model #7