marl / openl3

OpenL3: Open-source deep audio and image embeddings
MIT License
452 stars 56 forks source link

Refactor code and models to support TF 2.x and tf.keras #46

Closed auroracramer closed 3 years ago

auroracramer commented 4 years ago

At some point in the somewhat near future, we should establish support for TF 2.x and tf.keras. The main reasons for this are:

A priori, it seems like the main things to do are:

The main concern that comes to mind is the regression tests. We have already seen that tensorflow > 1.13 causes regression tests to fail. I imagine that this will only worsen as we introduce not only a new major release to TF, but also a divergence in Keras with tf.keras. @justinsalamon, what are your thoughts?

justinsalamon commented 4 years ago

Generally I totally support this move. My main concern as you've identified is the move from keras to tf.keras. As an example, I believe @jongwook tried to migrate CREPE from Keras to tf.keras and gave up on it because of incompatibilities and changes to the behavior of the model.

If we can make all of these changes while maintaining model performance (and hence regression tests), I think it's a no brainer. If the tests fail, then it'd probably be worth re-training the openl3 models against the new model implementation, ensuring the resulting embeddings are equally competitive on at least one downstream (e.g. US8K) and then releasing that.

Thoughts?

justinsalamon commented 4 years ago

p.s - we might be able to get away without re-training the models if we can at least show that the weights we have right now, coupled with the new implementation, still produce comparable downstream results. Then we'd just have to update the regression data for the regression tests.

So I guess my proposed steps would be:

  1. Update to tf.keras - if regression tests pass, we're done.
  2. If the regression tests fails, evaluate the resulting embeddings on downstream tasks. If they are equally competitive, update the regression test using the new implementation.
  3. If the modified embeddings aren't competitive, then we'd have to re-train against the new implementation.
jongwook commented 4 years ago

As an example, I believe @jongwook tried to migrate CREPE from Keras to tf.keras and gave up on it because of incompatibilities and changes to the behavior of the model.

This happened once when tf.keras was in a very early stage in the development, but CREPE is currently using tf.keras without causing disaster AFAIK. (and yes I am responsible for not writing real regression tests there)

justinsalamon commented 4 years ago

Ah cool, thanks for the info @jongwook. So it sounds like we should be able to at least give it a try, but yeah, we gotta be solid on performance and regression.

auroracramer commented 4 years ago

@justinsalamon That plan sounds good to me. I'll see about looking into this in the near future.

thias42 commented 4 years ago

Hi, first of all thank you for sharing the code and model! I had a go at the TF2 support (branch tf2 in my fork) and it turns out that the regression tests fail after switching to tf.keras. So the new embeddings would need to be evaluated on any downstream tasks.

gmos commented 4 years ago

Hi, we are working with a sound recognizer currently based on vggish. Our recognizer runs on an edge device and stretches the small CPU's to the limit. Too much heat and too much power usage. So we want to try out tensorflow-lite, perhaps in combination with a Coral co-processor. Unfortunately is only supported on TF2. For vggish there are now some TF2 ports available and we have an evaluation of the port on our roadmap.

But we also very much would like to evaluate openl3 for our application. But not if this would lock us to TF1.x again. The last post for this issue was almost 4 months ago. What are your current plans/goals regarding the TF2 port?

justinsalamon commented 4 years ago

@gmos we definitely plan to add TF 2.x support, but since this is a "side gig" for all of us, it is hard to provide concrete plans/estimates. We'll do our best to discuss in the near future and come up with an estimate, which we'll post here. Thanks!

fluffynukeit commented 3 years ago

@gmos When you say it's "only supported on TF2", do you mean vggish, tensorflow-lite, or the coral coprocessor? I am considering using the coral coprocessor to run the openl3 model on TF 1.x. Is that possible?

gmos commented 3 years ago

Nope. Last time when I looked at it, Coral required TF-Lite which only exists on TF2.

From: Daniel Austin notifications@github.com Sent: Wednesday, 13 January 2021 22:53 To: marl/openl3 openl3@noreply.github.com Cc: Gijs Mos gijs.mos@matanga.nl; Mention mention@noreply.github.com Subject: Re: [marl/openl3] Refactor code and models to support TF 2.x and tf.keras (#46)

@gmoshttps://github.com/gmos When you say it's "only supported on TF2", do you mean vggish, tensorflow-lite, or the coral coprocessor? I am considering using the coral coprocessor to run the openl3 model on TF 1.x. Is that possible?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/marl/openl3/issues/46#issuecomment-759761975, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUWEG3XJCRTPP4ZXDK6JTSZYI4RANCNFSM4LAUOORQ.

fluffynukeit commented 3 years ago

Edit: I have updated this post to reflect a correct method for compiling openl3 to tflite format, and will continue to update it if I discover better ways to do it.

I was able to compile and run the openl3 model using tflite with tensorflow 1.x on google colab. Here is the code to compile to tf-lite with %tensorflow_version 1.x. I could compile it and run it using the CPU only colab environment with no GPU accleration, which I assume will also work but it not necessary. I have also included an excerpt of the printout showing the openl3 step time and TFLite interpreter invocation time, as well as the first 10 of 512 embedding values for each. They agree well.

The key resource for me getting this to work was this Xilinx note about compiling a tensorflow 1.x model to pb format for use in their Vitis AI accelerator. Once in .pb format, the TFLite compilation went smoothly, unlike my many, many other attempts. It's tough to navigate the docs when trying to parse keras vs tensorflow, tensorflow 1 vs 2, various keras and tensorflow save formats, weights vs global variables (which was particularly troublesome when used with batchnormalization layers), etc.

https://github.com/Xilinx/Vitis-AI-Tutorials/tree/536b147e2820ff8ec16b11a85eff0f0c87f0a7da

import tensorflow as tf
import openl3
import keras

# This code does not require colab GPU backend to work.  CPU only is fine.

# Steps modeled from resource below
# https://github.com/Xilinx/Vitis-AI-Tutorials/tree/536b147e2820ff8ec16b11a85eff0f0c87f0a7da

ctype = "env"
itype = "mel128"
esize = 512 # I'm only interested in 512 model, so I don't change this
bn = f"ol3_{ctype}_{itype}"
dn = f"{bn}_export"
keras.backend.clear_session()

print("Disabling learning phases and building model")
keras.backend.set_learning_phase(0)

# load weights & architecture into new model
loaded_model = openl3.models.load_audio_embedding_model(
      content_type=ctype, embedding_size=esize, input_repr=itype)

print ('Keras model information:')
print (' Input names :',loaded_model.inputs)
print (' Output names:',loaded_model.outputs)
print('-------------------------------------')

# fetch the tensorflow session using the Keras backend
tf_session = keras.backend.get_session()

print("Saving")
!mkdir -p "{dn}"
# write out tensorflow checkpoint & meta graph.
# NOTE: the tf.global_variables() is important to preserve correct behavior of
# batch normalization layers.
# https://stackoverflow.com/questions/45800871/tensorflow-save-restore-batch-norm
saver = tf.compat.v1.train.Saver(tf.global_variables())
save_path = saver.save(tf_session,f"{dn}/{bn}.tf_ckpt")

! freeze_graph \
    --input_meta_graph  "{dn}/{bn}.tf_ckpt.meta" \
    --input_checkpoint  "{dn}/{bn}.tf_ckpt" \
    --output_graph      "{dn}/{bn}.pb" \
    --output_node_names "flatten_1/Reshape" \
    --input_binary      true

! tflite_convert \
    --output_file="{dn}/{bn}.tflite" \
    --graph_def_file="{dn}/{bn}.pb" \
    --input_arrays=input_1 \
    --output_arrays=flatten_1/Reshape

Here is the code to execute the openl3 and tflite models in colab, also in CPU only session type. I use a single 1 second audio file of 48000 samples exactly because I'm interested in running as audio is collected. I run each version 5 times with the same data to see the associated step times for each model.

import numpy as np
import tensorflow as tf
import soundfile as sf
import time
import openl3

# This code does not require colab GPU backend to work.  CPU only is fine.

ctype = "env"
itype = "mel128"
esize = 512 # I'm only interested in 512 model, so I don't change this
bn = f"ol3_{ctype}_{itype}"
dn = f"{bn}_export"

# Note: inputs data better be 48000 samples!
audio, sr = sf.read("1second.wav")
x = audio.reshape(1, 1, 48000)

# Check the openl3 output values (first 10 of 512)
model = openl3.models.load_audio_embedding_model(
      content_type=ctype, embedding_size=esize, input_repr=itype)

for i in range(5):
  output_ol3, _ = openl3.get_audio_embedding(audio, sr, model=model, center=False)
  print(f"OL Model, {i}: ", output_ol3[0][0:10])

# Test the first 10 values from TFLite interpreter file
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path=f"{dn}/{bn}.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

input_shape = input_details[0]['shape']
tflite_input = x.astype('float32')

for i in range(5):
  t = time.time()
  interpreter.set_tensor(input_details[0]['index'], 
          tflite_input)
  interpreter.invoke()

  # The function `get_tensor()` returns a copy of the tensor data.
  # Use `tensor()` in order to get a pointer to the tensor.
  output_data = interpreter.get_tensor(output_details[0]['index'])
  print(f"Time {i}: {(time.time() - t)*1000} msec")
  print(f"TFLite {i} ", output_data[0][0:10])

The resulting output of running both models is below. Openl3 and tflite version agree on results, and the TFlite version is indeed a little faster.

1/1 [==============================] - 1s 682ms/step
OL Model, 0:  [1.3157089 1.2120367 2.8306766 2.2643678 2.492464  1.2830704 2.141046
 1.5203106 1.1770811 3.5058348]
1/1 [==============================] - 0s 300ms/step
OL Model, 1:  [1.3157089 1.2120367 2.8306766 2.2643678 2.492464  1.2830704 2.141046
 1.5203106 1.1770811 3.5058348]
1/1 [==============================] - 0s 293ms/step
OL Model, 2:  [1.3157089 1.2120367 2.8306766 2.2643678 2.492464  1.2830704 2.141046
 1.5203106 1.1770811 3.5058348]
1/1 [==============================] - 0s 292ms/step
OL Model, 3:  [1.3157089 1.2120367 2.8306766 2.2643678 2.492464  1.2830704 2.141046
 1.5203106 1.1770811 3.5058348]
1/1 [==============================] - 0s 300ms/step
OL Model, 4:  [1.3157089 1.2120367 2.8306766 2.2643678 2.492464  1.2830704 2.141046
 1.5203106 1.1770811 3.5058348]
Time 0: 350.56400299072266 msec
TFLite 0  [1.3157148 1.2120365 2.830676  2.2643683 2.492464  1.2830614 2.1410458
 1.5203104 1.1770768 3.5058343]
Time 1: 253.8745403289795 msec
TFLite 1  [1.3157148 1.2120365 2.830676  2.2643683 2.492464  1.2830614 2.1410458
 1.5203104 1.1770768 3.5058343]
Time 2: 232.6219081878662 msec
TFLite 2  [1.3157148 1.2120365 2.830676  2.2643683 2.492464  1.2830614 2.1410458
 1.5203104 1.1770768 3.5058343]
Time 3: 242.37751960754395 msec
TFLite 3  [1.3157148 1.2120365 2.830676  2.2643683 2.492464  1.2830614 2.1410458
 1.5203104 1.1770768 3.5058343]
Time 4: 231.9028377532959 msec
TFLite 4  [1.3157148 1.2120365 2.830676  2.2643683 2.492464  1.2830614 2.1410458
 1.5203104 1.1770768 3.5058343]
fluffynukeit commented 3 years ago

I have edited my previous comment to include code that successfully compiles openl3 to TFLite in tensorflow 1.x on google colab. TFlite model is faster and produces resulting audio embeddings that agree with vanilla openl3.

justinsalamon commented 3 years ago

Thanks @fluffynukeit !

We're hoping to port openl3 to TF 2.x, it's just a question of when we're able to allocate the cycles to it.

fluffynukeit commented 3 years ago

@justinsalamon Thank you for providing openl3! It saved me so, so much time. I don't have proper ML hardware or ecosystem knowledge, so having models that largely "just work" out of the box for audio classification is great. I threw random forest at the embeddings as suggested in another issue here and was able to iterate on tuning it instead of messing with the NN models.

jonnor commented 3 years ago

Since May this year and merge request #62 it looks like TF2 is supported. So when 0.4.0 is released maybe this issue can be closed?

auroracramer commented 3 years ago

Closed by #62