Closed bionicles closed 4 years ago
Thanks @bionicles! So happy to accept PRs for activations such as swish, lisht, etc. I'm less sold on the value of aliasing tf.math.sin
and the other built in ops. Is the rationale just that users may not know they can utilize these ops as activations?
I guess one nice behavior is being able to reference activations as strings rather than functions, which is mostly a convenience but still useful for reducing boilerplate when doing hyperparameter tuning.
@seanpmorgan @kyleabeauchamp updated the code. yeah, for our architecture search project it's handy to just use strings, but yes, we can directly pass those functions
import tensorflow as tf
K = tf.keras
B, L = K.backend, K.layers
LOWER_ASYMPTOTE = 0
UPPER_ASYMPTOTE_AKA_CARRYING_CAPACITY = 1.
GROWTH_RATE = 1.
LOCATION_OF_MAX_GROWTH = 1.
START_TIME = 0.
COEFFICIENT_OF_EXPONENTIAL_TERM = 1.
IS_RELATED_TO_VALUE_Y_ZERO = 1.
IS_ADDED_TO_EXPONENTIAL_TERM = 1.
def generalized_logistic(
x,
a=LOWER_ASYMPTOTE,
k=UPPER_ASYMPTOTE_AKA_CARRYING_CAPACITY,
b=GROWTH_RATE,
q=IS_RELATED_TO_VALUE_Y_ZERO,
c=IS_ADDED_TO_EXPONENTIAL_TERM,
m=START_TIME,
v=LOCATION_OF_MAX_GROWTH,
):
numerator = k - a
exponential_term = B.exp(-b * (x - m))
denominator = (c + q * exponential_term ** (1/v))
return a + numerator / denominator
class Logistic(L.Layer):
def __init__(self):
super(Logistic, self).__init__()
def build(self, input_shape):
self.lower_asymptote = tf.Variable(
0., trainable=True)
self.upper_asymptote_aka_carrying_capacity = tf.Variable(
1., trainable=True)
self.growth_rate = tf.Variable(
1., trainable=True)
self.is_related_to_value_y_zero = tf.Variable(
1., trainable=True)
self.is_added_to_exponential_term = tf.Variable(
1., trainable=True)
self.start_time = tf.Variable(
1., trainable=True)
self.location_of_max_growth = tf.Variable(
1., trainable=True)
def call(self, x):
return generalized_logistic(
x,
a=self.lower_asymptote,
k=self.upper_asymptote_aka_carrying_capacity,
b=self.growth_rate,
q=self.is_related_to_value_y_zero,
c=self.is_added_to_exponential_term,
m=self.start_time,
v=self.location_of_max_growth)
def mish(x):
"""
Mish: A Self Regularized Non-Monotonic Neural Activation Function
https://arxiv.org/abs/1908.08681v1
"""
return (x * B.tanh(B.softplus(x)))
Please assign rrelu to me and it seems swish has been implemented in tensorflow.nn module.@seanpmorgan
https://github.com/tensorflow/tensorflow/issues/32783
from math import pi
B = tf.keras.backend
SQRT_2_D_PI = B.sqrt(2 / tf.convert_to_tensor(pi))
@tf.function
def gelu(x):
right = B.tanh(SQRT_2_D_PI * (x + 0.044715 * B.pow(x, 3)))
return 0.5 * x * (1 + right)
here are parametric linear, polynomial, and a parametric swish: (tends to blow up and make NaN tho)
import tensorflow as tf
from nature import L1L2
L = tf.keras.layers
class Linear(L.Layer):
""" y = mx + b
broadcast scalar weight and bias to all inputs (trainable)
"""
def __init__(self):
super().__init__()
self.m = self.add_weight(
initializer=tf.keras.initializers.ones(),
regularizer=L1L2(), trainable=True)
self.b = self.add_weight(
initializer="glorot_normal",
regularizer=L1L2(), trainable=True)
@tf.function
def call(self, x):
return self.m * x + self.b
import tensorflow as tf
from nature import L1L2
init = tf.keras.initializers.TruncatedNormal
class Polynomial(tf.keras.layers.Layer):
def __init__(self, power=4):
super().__init__()
self.powers = []
for p in list(range(power)):
coefficient = self.add_weight(
initializer=init(), trainable=True, regularizer=L1L2())
super().__setattr__(f"{p}", coefficient)
self.powers.append((coefficient, p))
self.built = True
@tf.function
def call(self, x):
y = 0.
for coefficient, power in self.powers:
y = y + coefficient * tf.math.pow(x, power)
return y
import tensorflow as tf
from nature import Polynomial, Logistic, Linear
L = tf.keras.layers
class PSwish(L.Layer):
def __init__(self, layer_fn=Linear):
super().__init__()
self.multiply = L.Multiply()
self.logistic = Logistic()
self.linear_or_polynomial = layer_fn()
self.built = True
@tf.function
def call(self, x):
one = self.linear_or_polynomial(x)
two = self.logistic(x)
return self.multiply([one, two])
def PolySwish():
return PSwish(layer_fn=Polynomial)
also, here's Logistic Map, which is (if you believe wikipedia) a simple function on the "Edge of Chaos"
The relative simplicity of the logistic map makes it a widely used point of entry into a consideration of the concept of chaos.[1] A rough description of chaos is that chaotic systems exhibit a great sensitivity to initial conditions—a property of the logistic map for most values of r between about 3.57 and 4 (as noted above).[2] A common source of such sensitivity to initial conditions is that the map represents a repeated folding and stretching of the space on which it is defined. In the case of the logistic map, the quadratic difference equation describing it may be thought of as a stretching-and-folding operation on the interval (0,1).[9] https://en.wikipedia.org/wiki/Logistic_map
import tensorflow as tf
K, L = tf.keras, tf.keras.layers
class LogisticMap(L.Layer):
def __init__(self):
super().__init__()
self.r = tf.random.uniform((), minval=3.57, maxval=4.)
self.built = True
@tf.function
def call(self, x):
min = tf.math.reduce_min(x)
x = (x - min) / (tf.math.reduce_max(x) - min)
return self.r * x * (1. - x)
we could also re-sample "r" each call of the function:
@tf.function
def logistic_map(x):
r = tf.random.uniform((), minval=3.57, maxval=4.)
min = tf.math.reduce_min(x)
x = (x - min) / (tf.math.reduce_max(x) - min)
return r * x * (1. - x)
from math import pi B = tf.keras.backend SQRT_2_D_PI = B.sqrt(2 / tf.convert_to_tensor(pi)) @tf.function def gelu(x): right = B.tanh(SQRT_2_D_PI * (x + 0.044715 * B.pow(x, 3))) return 0.5 * x * (1 + right)
We have already had C++/CUDA kernel for gelu activation, which is much faster than pure Python operations. https://github.com/tensorflow/addons/blob/master/tensorflow_addons/activations/gelu.py
@bionicles Thank you very much for all of these. I think a lot of these are now implemented or under review (gelu
, mish
, softshrink
, hardshrink
, rrelu
, lisht
, sparsemax
, tanhshrink
).
However, this issue format makes it very difficult for us to evaluate specific activations and determine who will be working on them. For that reason I'm going to close this issue...but feel free to open a single issue per missing activation that'd you would like to propose. Just a note I don't think we'll be accepting any of the alias'ed activations like (tf.sin). IMO if you're building architecture search you can quickly create a dictionary if you want string shortcuts.
From the original list we are tracking Soft-argmax at https://github.com/tensorflow/addons/issues/1364
@bionicles I actually quite interested in cahotic activation functions - https://github.com/tensorflow/addons/issues/437#issuecomment-535587510 - Logistic Map Thanks you very much for this code sharing
I have couple of questions regarding this code snipset.
Thanks! Idan
@jvishnuvardhan @yongtang @seanpmorgan follow-up on the tf issue System information
Describe the feature and the current behavior/state. activations are high-yield because they dramatically influence performance for little code
Will this change the current api? How? just adds more activations
Who will benefit with this feature? people doing hyperparameter search can benefit especially
Any Other info. here is an updated python file with some activations (converted the if/elif stuff into a lookup table at the bottom)