tensorflow / models

Models and examples built with TensorFlow
Other
77.04k stars 45.77k forks source link

Request to upgrade /research/global_objectives to TF 2.x #8062

Open wingman-jr-addon opened 4 years ago

wingman-jr-addon commented 4 years ago

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

I would like to request that the excellent loss functions in loss_layers get upgraded to use TF2.x, perhaps even with appropriate Keras wrappers and promotion to mainline TF. I recently discovered the research work here and tried using it in my TF2.x project but was not sure how to upgrade the usage of the model_variables (although I did get roc_auc_loss up and running). In general, if these loss functions are stable I believe they would represent important advances to the machine learning community if they were added into the main repository - it still seems to be a relatively common belief that one cannot optimize towards precision/recall-related objectives, and this would help dispel that notion.

Thank you to @mackeya-google and the original authors for putting this out there!

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Relevant section of upgrade log: 281:17: ERROR: Using member tf.contrib.framework.model_variable in deprecated module tf.contrib. tf.contrib.framework.model_variable cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code.

ogencoglu commented 4 years ago

+1

vnicula commented 4 years ago

I can confirm that the only unconverted construct is contrib.framework.model_variable; roc_auc_loss has no trainable variables so it doesn't use it and it seems to work out of the box after conversion.

Here is the relevant output from conversion script:

TensorFlow 2.0 Upgrade Script

Converted 1 files Detected 4 issues that require attention

File: loss_layers.py loss_layers.py:270:28: ERROR: Using member tf.contrib.framework.model_variable in deprecated module tf.contrib. tf.contrib.framework.model_variable cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code. loss_layers.py:282:17: ERROR: Using member tf.contrib.framework.model_variable in deprecated module tf.contrib. tf.contrib.framework.model_variable cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code. loss_layers.py:475:13: ERROR: Using member tf.contrib.framework.model_variable in deprecated module tf.contrib. tf.contrib.framework.model_variable cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code. loss_layers.py:1175:20: ERROR: Using member tf.contrib.framework.model_variable in deprecated module tf.contrib. tf.contrib.framework.model_variable cannot be converted automatically. tf.contrib will not be distributed with TensorFlow 2.0, please consider an alternative in non-contrib TensorFlow, a community-maintained repository such as tensorflow/addons, or fork the required code.

wingman-jr-addon commented 4 years ago

I'd like to reiterate: I think these loss functions are critical. It seems like a great number of times when a binary classifier is being trained, these would often be a first choice for a loss function over the current built-in stock ones. It's not clear to me that there are any built-in ones that perform a similar function, nor is it clear to me that newer research has superseded these efforts (although other losses with similar goals have been approached). To that end I think implementations that work with the current major version of TF (and Keras) are important, too.

Unfortunately, I'm not sure one would go about converting the model variable code. Can anyone suggest something like a reasonable upgrade strategy for this thorny part? Thanks!

ksricharank commented 3 years ago

I made an attempt at addressing this by replacing tf.contrib.framework.model_variable with tf.Variable, and have working code now.

For example, I modified

dual_variable = tf.contrib.framework.model_variable( name=name, shape=shape, dtype=dtype, initializer=initializer, collections=collections, trainable=trainable)

in loss_layers.py (line 830) to

init = tf.constant_initializer(1.0); dual_variable = tf.Variable( name=name, shape=shape, dtype=dtype, initial_value=lambda: init(shape=shape, dtype=dtype), trainable=trainable)

I made similar modifications to other instances where tf.contrib.framework.model_variable is listed. I am happy to share the modified loss_layers.py and utils.py files but wanted to first check if the changes I made above are valid.

van-flow commented 3 years ago

I was able to make this work.

1) Convert loss_layers.py and util.py to TF 2.0. You can use https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/upgrade.ipynb

2) Edit these files and change all instances of "tf.contrib.framework.model_variable" to "tf.compat.v1.get_variable"

Define loss functions as:


import keras.backend as K

from global_objectives import loss_layers
from global_objectives import util

from global_objectives.loss_layers import precision_at_recall_loss
from global_objectives.loss_layers import roc_auc_loss
from global_objectives.loss_layers import precision_recall_auc_loss
from global_objectives.loss_layers import recall_at_precision_loss

class precision_at_recall(tf.keras.losses.Loss):
    def __init__(self, target_recall=0.1, name="precision_at_recall"):
        super().__init__(name=name)
        self.target_recall = target_recall

    def call(self, y_true, y_pred):
        y_true = K.reshape(y_true, (-1, 1)) 
        y_pred = K.reshape(y_pred, (-1, 1))  
        return precision_at_recall_loss(y_true, y_pred, self.target_recall)[0]

Then use:

model.compile(optimizer= 'adam', 
              loss= precision_at_recall(target_recall=TR), 
              steps_per_execution = N, 
              metrics=[Metric],
              run_eagerly=True)

That's it.

I have not had any luck getting any better solutions with these losses and the convergence is extremely slow/poor. If anyone has ideas on what optimizer, dual_rate_factor, etc. work best, please share. Hope this helps.

wingman-jr-addon commented 3 years ago

Great! Getting it running is the first step. I had good success with the one that didn't require the special porting, roc_auc_loss. As a baseline comparison, have you tried that one and did it work any better than the newly ported ones?

van-flow commented 3 years ago

Great! Getting it running is the first step. I had good success with the one that didn't require the special porting, roc_auc_loss. As a baseline comparison, have you tried that one and did it work any better than the newly ported ones?

Yes, I have. They all run now, but the precision/recall either do not improve or jump dramatically from epoch to epoch. Did you run roc_auc_loss() with default settings? and what optimizer did you use with it?

chronicom commented 3 years ago

I made an attempt at addressing this by replacing tf.contrib.framework.model_variable with tf.Variable, and have working code now.

For example, I modified

dual_variable = tf.contrib.framework.model_variable( name=name, shape=shape, dtype=dtype, initializer=initializer, collections=collections, trainable=trainable)

in loss_layers.py (line 830) to

init = tf.constant_initializer(1.0); dual_variable = tf.Variable( name=name, shape=shape, dtype=dtype, initial_value=lambda: init(shape=shape, dtype=dtype), trainable=trainable)

I made similar modifications to other instances where tf.contrib.framework.model_variable is listed. I am happy to share the modified loss_layers.py and utils.py files but wanted to first check if the changes I made above are valid.

Hi would you mind to share the working code?

wingman-jr-addon commented 3 years ago

@van-flow I train a CNN classifier with some special sauce as well as have the feature layer as an output. Then as a post-processing step I take the feature layer's output and feed it into a tiny net to create a binary classifier and that's where I use the roc_auc_loss.

Here's a sketch.

    input = Input((1280,)) # Features - 1280 for MobileNetV2, 1792 for EfficientNet B4
    node = Dense(50, activation='sigmoid')(input)
    node = Dropout(0.4)(node)
    node = Dense(10, activation='sigmoid')(node)
    node = Dense(1, activation='sigmoid')(node)
    post_model = Model(inputs=input,outputs=node)

    post_model.compile(
        optimizer=Adam(lr=0.0001), # 0.0001 works well
        loss=roc_auc_loss,
        metrics=['accuracy']
        )

And with a little glue to the roc_auc_loss:

def roc_auc_loss(y_true, y_pred):
    y_true = tf.keras.backend.reshape(y_true, (VALIDATION_SIZE, 1)) 
    y_pred = tf.keras.backend.reshape(y_pred, (VALIDATION_SIZE, 1))   
    util.get_num_labels = lambda labels : 1
    return loss_layers.roc_auc_loss(y_true, y_pred)[0]
satyajeet1997 commented 2 years ago

Hi, I was trying to check the efficiency of loss function (defined by @van-flow) on a NN model I created. I have converted the existing code to TF 2 and also replaced tf.contrib.framework.model_variable with tf.compat.v1.get_variable Here is a sketch:

from loss import precision_at_recall_loss
from loss import roc_auc_loss
from loss import precision_recall_auc_loss
from loss import recall_at_precision_loss

class precision_at_recall(tf.keras.losses.Loss):
def __init__(self, target_recall=0.1, name="precision_at_recall"):
    super().__init__(name=name)
    self.target_recall = target_recall

def call(self, y_true, y_pred):
    y_true = K.reshape(y_true, (-1, 1)) 
    y_pred = K.reshape(y_pred, (-1, 1))  
    return precision_at_recall_loss(y_true, y_pred, self.target_recall)[0]

model.compile(optimizer= 'adam', 
          loss= precision_at_recall, 
          metrics=['accuracy'],
          run_eagerly=True)
out=model.fit(train_x_sc,train_y,validation_data=(test_x_sc,test_y),
          epochs=20,batch_size=10)

I am getting this error : "Cannot convert '' to EagerTensor of dtype float". Help me out with this one

temiwale88 commented 2 years ago

Can anyone please post their updated / upgraded script? I've upgraded mine with _!tf_upgrade_v2_ but still needing to make some changes. It will help others. Thanks! :)

P.S. If you're getting this error ValueError: tf.function only supports singleton tf.Variables created on the first call. Make sure the tf.Variable is only created once or created outside tf.function

Then you need to change the wrapping functions into a class so you can set your tf.Variables or tf.compat.v1.get_variable to None every time the class function is called. Follow the recommend steps in these documentations: https://www.tensorflow.org/guide/function#creating_tfvariables & https://bit.ly/3wpEURT

temiwale88 commented 2 years ago

Variable

Yes, please share :-D

rydeveraumn commented 1 year ago

Has there been any more movement on this issue? Also, I am interested in porting this into pytorch does anyone know if this work has been started / done?

rydeveraumn commented 1 year ago

I also agree with @wingman-jr-addon this could be potentially game changing. I wonder why this is no longer being maintained? If the work here could be extended to work with different use cases I don't see why ML / DL practitioners would not use this type of tool on a regular basis.

wingman-jr-addon commented 1 year ago

@rydeveraumn I was poking around a bit tonight and found this: https://github.com/iridiumblue/roc-star - might be of interest.