Closed SSaishruthi closed 4 years ago
Clarification: Do we need to hold state information for this?
@SSaishruthi Yes, we'll need a running sum of hamming loss and count
increment every time update_state
is called. The result
can return the average value of hamming loss.
@Squadrick Perfect, thanks for the clarification. Will submit a PR soon.
@SSaishruthi Use this: MeanMetricWrapper
. Keras already has something that wraps a stateless function and does the aggregate.
@Squadrick Thanks again for the links. Will keep you posted about the updates.
@Squadrick I tried wrapping hamming metrics. Below are the observations.
epoch
count and dividing it by current results did not provide the desired result.count
variable for holding the number of data points in a particular epoch. It worked fine.
Reference: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB#scrollTo=UKTf8PxceWDHI am not able to import MeanMetricWrapper
so used Mean
If this is fine, I will create a PR with all supporting scripts.
@seanpmorgan @facaiy @WindQAQ
We can't import MeanMetricWrapper
using tf.keras.metrics.MeanMetricWrapper
, but can be imported using tf.python.keras.metrics.MeanMetricWrapper
. Is the latter fine, or should I open a PR for TF master to tf_export
the API for MeanMetricWrapper
(here).
Exposing MeanMetricWrapper
will make the implementation much cleaner.
def hamming_loss(y_true, y_pred, mode='multiclass'):
if mode not in ['multiclass', 'multilabel']:
raise TypeError('mode must be: [None, multilabel])')
if mode == 'multiclass':
nonzero = tf.cast(tf.math.count_nonzero(y_true * y_pred, axis=-1), tf.float32)
return 1.0 - nonzero
else:
nonzero = tf.cast(tf.math.count_nonzero(y_true - y_pred, axis=-1),
tf.float32)
return nonzero / y_true.get_shape()[-1]
class HammingLoss(tf.python.keras.metrics.MeanMetricWrapper):
def __init__(self, name='hamming_loss', dtype=None, mode='multiclass'):
super(HammingLoss, self).__init__(
hamming_loss, name, dtype=dtype, mode=mode)
@seanpmorgan @facaiy @WindQAQ
We can't import
MeanMetricWrapper
usingtf.keras.metrics.MeanMetricWrapper
, but can be imported usingtf.python.keras.metrics.MeanMetricWrapper
. Is the latter fine, or should I open a PR for TF master totf_export
the API forMeanMetricWrapper
(here).
@Squadrick so tf.python is not a public API and we should avoid it. You can bring this up in this issue: https://github.com/tensorflow/tensorflow/issues/28601 to see what tf-core devs recommend. It may be exposing the API as public or just copying it statically into Addons.
@seanpmorgan @Squadrick
Should I proceed with Mean
till we get a response on this?
@seanpmorgan @Squadrick
Are we going have a version of MeanMetricWrapper
in addons?
I'm copied the implementation from core TF to TFA: #316. Once that's merged, @SSaishruthi can proceed with the implementation.
Looks like the PR got merged. I will start working on that.
@Squadrick @facaiy Getting this error when trying to import tensorflow addons in colab.
Any comment on how to get rid of this?
NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory
Any comment on how to get rid of this?
NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory
@SSaishruthi Could you link the colab notebook? Be sure to run !pip install tensorflow==2.0.0-beta1
first. This error likely means that you're running tf2-alpha or tf1.x
@seanpmorgan
Colab link: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB
Using tf2-beta1
Colab link: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB Using tf2-beta1
Could you try to reset the runtime and run the cells in order again. I just created a copy and it's working: https://colab.research.google.com/drive/1wKDdQCirA4LEHdx4bgkQHP1YZZSAT-5I
@seanpmorgan Thanks
I was just resetting the current runtime. Just tried after resetting all the run times and it worked.
I am trying to import MeanMetricWrapper
and not able to. Only CohenKappa
is available
Please view the same notebook for reference. Not sure if I need to build from source.
Should I do anything from my side?
@seanpmorgan
from tensorflow_addons.metrics.utils import MeanMetricWrapper
should work? If you're talking about in a colab notebook you may have to use !pip install tfa-nightly
if it was added after 0.4 release
@Squadrick
I am trying to wrap hamming loss using MeanMetricWrapper
as per the suggestion. I have some clarifications about the same.
Taking Mean
over the total value was not yielding a proper result.
Using the mean method: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB#scrollTo=UKTf8PxceWDH
As you can see in the notebook, result does not match.
Whereas, if I hold the state of number of records in every epoch I was able to get the result expected results.
Holding state: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB#scrollTo=bGBO5unx33xS
I am not sure if I am missing anything here. Can I use the regular method of using Metric
?
Please suggest.
Also, for hamming distance metric, I think it is ok to have a function like below just like euclidean.
If this is fine, I can create a separate PR for this. This can be used as an alternate distance metric
def hamming_distance(actuals, predictions):
result=tf.not_equal(actuals,predictions)
not_eq = tf.reduce_sum(tf.cast(result, tf.float32))
ham_distance = tf.math.divide_no_nan(not_eq, len(result))
return ham_distance
def hamming_loss(y_true, y_pred, mode='multiclass'):
if mode not in ['multiclass', 'multilabel']:
raise TypeError('mode must be: [multiclass, multilabel])')
if mode == 'multiclass':
nonzero = tf.cast(tf.math.count_nonzero(y_true * y_pred, axis=-1), tf.float32)
print(nonzero)
return 1.0 - nonzero
else:
nonzero = tf.cast(tf.math.count_nonzero(y_true - y_pred, axis=-1),
tf.float32)
return nonzero / y_true.get_shape()[-1]
class HammingLoss(tf.python.keras.metrics.MeanMetricWrapper):
def __init__(self, name='hamming_loss', dtype=None, mode='multiclass'):
super(HammingLoss, self).__init__(
hamming_loss, name, dtype=dtype, mode=mode)
This works for me. The idea to to have hamming_loss
calculate loss from each sample in the batch separately, and let MeanMetricWrapper
do the aggregation.
So:
actuals = tf.constant([[0, 1, 0, 0], [0, 1, 0, 0], [1, 0, 0, 0]],
dtype=tf.int32)
predictions = tf.constant([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0]],
dtype=tf.int32)
print(hamming_loss(actuals, predictions, mode='multiclass').numpy()) #prints [1, 1, 1]
hamm = HammingLoss(mode='multiclass')
hamm.update_state(actuals, predictions)
print(hamm.result().numpy()) # prints 1.0
@Squadrick Thanks for the clarification. Got the idea now. Will create a PR
@Squadrick Also, can we have hamming distance separately as a distance metric?
@SSaishruthi You can call the file hamming.py
or hamming_metrics.py
and add: hamming_distance
, hamming_loss
and HammingLoss
(as a tf.keras.metrics.Metric
).
@Squadrick How would this look as a loss function instead of a metric?
@rjurney The only problem I see is that tf.count_nonzero is non-differentiable which could be solved by rewriting it with a close approximation, resulting in:
def hamming_loss(y_true, y_pred):
diff = tf.cast(y_true - y_pred, dtype=tf.float32)
#Counting non-zeros in a differentiable way
epsilon = K.epsilon()
nonzero = tf.reduce_sum( tf.math.abs( diff / (tf.math.abs(diff) + epsilon) ))
return tf.reduce_mean(nonzero / K.int_shape(y_pred)[-1])
@seanpmorgan why closed?
@seanpmorgan why closed?
Hamming loss was merged in https://github.com/tensorflow/addons/pull/342.
Cool!
On Mon, Dec 2, 2019 at 6:15 AM Sean Morgan notifications@github.com wrote:
@seanpmorgan https://github.com/seanpmorgan why closed?
Hamming loss was merged in #342 https://github.com/tensorflow/addons/pull/342.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/addons/issues/305?email_source=notifications&email_token=AAAKJJIUHQ7I3KUD3IWUKTTQWUKAJA5CNFSM4HYZG6CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFTTZYQ#issuecomment-560413922, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKJJPUVWC6PIEPLK2O36TQWUKAJANCNFSM4HYZG6CA .
System information
Describe the feature and the current behavior/state. Hamming score is of great interest in multilabel classification.
Will this change the current api? How? Yes, it will add a new feature
Who will benefit with this feature? Anyone working with multilabel classification
Any Other info. Initial colab notebook: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB