keras to CoreML with custom loss function

un-lock-me commented 5 years ago

Hi,

Thank you so much for sharing this useful code with us. I am trying to convert a keras code to core ML. thia is the source code Im working on. https://github.com/sbillburg/CRNN-with-STN this is an OCR written in keras and with CRNN architecture. in this code, loss is an custm layer written by LAMBDA layer. I tried your code on this but it keeps raising error.

I was wondering do you have a workaround code with the loss function or any simpler approach.

Thanks for taking the time

hollance commented 5 years ago

Your Core ML model should not include the loss function as that is used only for training, which is not possible with Core ML.

un-lock-me commented 5 years ago

Thank you so much for replying back:) you mean that we can not convert any model which has custom loss to core ML? if so, do you have any idea how can I convert the ocr code to any other model understandable by ios? Most OCR keras code I explored have the lambda layer for their loss.

Thanks again for taking the time

un-lock-me commented 5 years ago

this is my code:


import coremltools

def convert_lambda(layer):
    # Only convert this Lambda layer if it is for our swish function.
    if layer.function == ctc_lambda_func:
        params = NeuralNetwork_pb2.CustomLayerParams()

        # The name of the Swift or Obj-C class that implements this layer.
        params.className = "x"

        # The desciption is shown in Xcode's mlmodel viewer.
        params.description = "A fancy new loss"

        return params
    else:
        return None

print("\nConverting the model:")

# Convert the model to Core ML.
coreml_model = coremltools.converters.keras.convert(
    model,
    # 'weightswithoutstnlrchangedbackend.best.hdf5',
    input_names="image",
    image_input_names="image",
    output_names="output",
    add_custom_layers=True,
    custom_conversion_functions={"Lambda": convert_lambda},
    )

and this is the error:

Converting the model:
Traceback (most recent call last):
  File "/home/sgnbx/Downloads/projects/CRNN-with-STN-master/CRNN_with_STN.py", line 201, in <module>
    custom_conversion_functions={"Lambda": convert_lambda},
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/coremltools/converters/keras/_keras_converter.py", line 760, in convert
    custom_conversion_functions=custom_conversion_functions)
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/coremltools/converters/keras/_keras_converter.py", line 556, in convertToSpec
    custom_objects=custom_objects)
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/coremltools/converters/keras/_keras2_converter.py", line 255, in _convert
    if input_names[idx] in input_name_shape_dict:
IndexError: list index out of range
Input name length mismatch

hollance commented 5 years ago

The easiest thing to do is to remove this lambda layer from the Keras model first.

un-lock-me commented 5 years ago

Isnt it obligatory to have lambda layer here as the model need to use a custom version of the loss rather than the already built in loss in the keras? I have looked at a couple of the code with crrn architecture, all had the same custom loss.

hollance commented 5 years ago

The Keras model only needs to have that loss for training, not for making predictions.

un-lock-me commented 5 years ago

Can you please link me to a blog, source whatever? I cant understand your saying honestly. as I just use the custom loss in the training. then saved the model, then use the saved model to convert to CoreMl. Which part I am missing that I can not get you. sorry for the many questions and taking your time.

hollance commented 5 years ago

After training, remove the lambda layer from the Keras model and save this model. Then use coremltools to convert that Keras model (without the lambda layer) to Core ML.

un-lock-me commented 5 years ago

So I can not get your approach. Do you mind have a quick look on the code and say that which part I have to change?(myconfusion is that, I can save the trained model, how can I save a model without training, what will be saved on the model, in terms of weight or other parameter?) I am really sorry for taking your time and much appreciated. (if you know a link regarding your approach I will be happy to explore it and do by myself but unfortunately I can not get your saying)

from coremltools.proto import NeuralNetwork_pb2
from keras import backend ,optimizers
from keras.callbacks import *
from keras.layers import *
from keras.models import *
from keras.optimizers import SGD
from keras.utils import *

from keras.callbacks import ModelCheckpoint
from keras.callbacks import TensorBoard

# from STN.spatial_transformer import SpatialTransformer

from Batch_Generator import img_gen, img_gen_val

from config import learning_rate, load_model_path, width, height, characters, label_len, label_classes, \
    cp_save_path, base_model_path, tb_log_dir

# functions
class Evaluate(Callback):

    def on_epoch_end(self, epoch, logs=None):
        acc = evaluate(base_model)
        print('')
        print('acc:'+str(acc)+"%")

evaluator = Evaluate()

def evaluate(input_model):
    correct_prediction = 0
    generator = img_gen_val()

    x_test, y_test = next(generator)
    # print(" ")
    y_pred = input_model.predict(x_test) 
    shape = y_pred[:, 2:, :].shape 
    ctc_decode = backend.ctc_decode(y_pred[:, 2:, :], input_length=np.ones(shape[0])*shape[1])[0][0]
    out = backend.get_value(ctc_decode)[:, :label_len]

    for m in range(1000):
        result_str = ''.join([characters[k] for k in out[m]])
        result_str = result_str.replace('-', '')
        if result_str == y_test[m]:
            correct_prediction += 1
            # print(m)
        else:
            print(result_str, y_test[m])

    return correct_prediction*1.0/10

def ctc_lambda_func(args):
    iy_pred, ilabels, iinput_length, ilabel_length = args
    # the 2 is critical here since the first couple outputs of the RNN
    # tend to be garbage:
    iy_pred = iy_pred[:, 2:, :]  # no such influence
    return backend.ctc_batch_cost(ilabels, iy_pred, iinput_length, ilabel_length)

# initial bias_initializer
def loc_net(input_shape):
    b = np.zeros((2, 3), dtype='float32')
    b[0, 0] = 1
    b[1, 1] = 1
    w = np.zeros((64, 6), dtype='float32')
    weights = [w, b.flatten()]

    loc_input = Input(input_shape)

    loc_conv_1 = Conv2D(16, (5, 5), padding='same', activation='relu')(loc_input)
    loc_conv_2 = Conv2D(32, (5, 5), padding='same', activation='relu')(loc_conv_1)
    loc_fla = Flatten()(loc_conv_2)
    loc_fc_1 = Dense(64, activation='relu')(loc_fla)
    loc_fc_2 = Dense(6, weights=weights)(loc_fc_1)

    output = Model(inputs=loc_input, outputs=loc_fc_2)

    return output

# build model
inputShape = Input((width, height, 3))  # base on Tensorflow backend
conv_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputShape)
batchnorm_1 = BatchNormalization()(conv_1)

conv_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv_1)
conv_3 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv_2)
batchnorm_3 = BatchNormalization()(conv_3)
pool_3 = MaxPooling2D(pool_size=(2, 2))(batchnorm_3)

conv_4 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool_3)
conv_5 = Conv2D(512, (3, 3), activation='relu', padding='same')(conv_4)
batchnorm_5 = BatchNormalization()(conv_5)
pool_5 = MaxPooling2D(pool_size=(2, 2))(batchnorm_5)

conv_6 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool_5)
conv_7 = Conv2D(512, (3, 3), activation='relu', padding='same')(conv_6)
batchnorm_7 = BatchNormalization()(conv_7)

bn_shape = batchnorm_7.get_shape()  # (?, {dimension}50, {dimension}12, {dimension}256)

'''----------------------STN-------------------------'''
# you can run the model without this STN part by commenting out the STN lines then connecting batchnorm_7 to x_reshape,
# which may bring you a higher accuracy
# stn_input_shape = batchnorm_7.get_shape()
# loc_input_shape = (stn_input_shape[1].value, stn_input_shape[2].value, stn_input_shape[3].value)
# stn = SpatialTransformer(localization_net=loc_net(loc_input_shape),
#                          output_size=(loc_input_shape[0], loc_input_shape[1]))(batchnorm_7)
'''----------------------STN-------------------------'''

print(bn_shape)  # (?, 50, 7, 512)

# reshape to (batch_size, width, height*dim)
# x_reshape = Reshape(target_shape=(int(bn_shape[1]), int(bn_shape[2] * bn_shape[3])))(stn_7)
x_reshape = Reshape(target_shape=(int(bn_shape[1]), int(bn_shape[2] * bn_shape[3])))(batchnorm_7)

fc_1 = Dense(128, activation='relu')(x_reshape)  # (?, 50, 128)

print(x_reshape.get_shape())  # (?, 50, 3584)
print(fc_1.get_shape())  # (?, 50, 128)

rnn_1 = LSTM(128, kernel_initializer="he_normal", return_sequences=True)(fc_1)
rnn_1b = LSTM(128, kernel_initializer="he_normal", go_backwards=True, return_sequences=True)(fc_1)
rnn1_merged = add([rnn_1, rnn_1b])

rnn_2 = LSTM(128, kernel_initializer="he_normal", return_sequences=True)(rnn1_merged)
rnn_2b = LSTM(128, kernel_initializer="he_normal", go_backwards=True, return_sequences=True)(rnn1_merged)
rnn2_merged = concatenate([rnn_2, rnn_2b])

drop_1 = Dropout(0.25)(rnn2_merged)

fc_2 = Dense(label_classes, kernel_initializer='he_normal', activation='softmax')(drop_1)

# model setting
base_model = Model(inputs=inputShape, outputs=fc_2)  # the model for prediecting

labels = Input(name='the_labels', shape=[label_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')

loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([fc_2, labels, input_length, label_length])

model = Model(inputs=[inputShape, labels, input_length, label_length], outputs=[loss_out])  # the model for trainning

# clipnorm seems to speeds up convergence
sgd = SGD(lr=learning_rate, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
adam = optimizers.Adam()

model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)

model.summary()  # print a summary representation of your model.
# plot_model(model, to_file='CRNN_with_STN.png', show_shapes=True)  # save a image which is the architecture of the model
cp_save_path = "weightswithoutstnlrchanged.best.hdf5"
checkpoint = ModelCheckpoint(cp_save_path, monitor='loss', verbose=1, save_best_only=True, mode='min')

# if you want to load a trained model weights, just fill the load_model_path in config.py, it will automaticlly add into the
# new trainning. if you want to train a new model, just set load_model_path = ''.

# if len(load_model_path) > 5:
#     savedmodel = load_model(load_model_path, custom_objects={"bknd": backend,'loss':loss_out})

# try your own fit_generator() settings, you may get a better result
model.fit_generator(img_gen(input_shape=bn_shape), steps_per_epoch=100, epochs=1, verbose=1,
                    callbacks=[evaluator,
                               checkpoint,
                               TensorBoard(log_dir=tb_log_dir)])

base_model.save(base_model_path)

import coremltools

def convert_lambda(layer):
    # Only convert this Lambda layer if it is for our swish function.
    if layer.function == ctc_lambda_func:
        params = NeuralNetwork_pb2.CustomLayerParams()

        # The name of the Swift or Obj-C class that implements this layer.
        params.className = "x"

        # The desciption is shown in Xcode's mlmodel viewer.
        params.description = "A fancy new loss"

        return params
    else:
        return None

print("\nConverting the model:")

# Convert the model to Core ML.
coreml_model = coremltools.converters.keras.convert(
    model,
    # 'weightswithoutstnlrchangedbackend.best.hdf5',
    input_names="image",
    image_input_names="image",
    output_names="output",
    add_custom_layers=True,
    custom_conversion_functions={"Lambda": convert_lambda},

hollance commented 5 years ago

You should convert base_model instead of model. That’s all.

hollance / CoreML-Custom-Layers

keras to CoreML with custom loss function #3