Problem in reimplement m4 result in origin parper

fecet commented 2 years ago

I'm trying to reproduce the results for m4 dataset and have done most work, but there still exist some problems, with frequency and lookback increase, the model tend to output large value and smape loss stop at something like 199.02. As the input size is determined by horizon(forcast_length) and lookback:

backcast_length=forecast_length*lookback

I guess the problem result from large input_size, but I have no idea how to fix it. Here is my smape loss

def smape_loss(y_true,y_pred):
    """
    sMAPE loss as defined in "Appendix A" of
    http://www.forecastingprinciples.com/files/pdf/Makridakia-The%20M3%20Competition.pdf
    :param forecast: Forecast values. Shape: batch, time
    :param target: Target values. Shape: batch, time
    :param mask: 0/1 mask. Shape: batch, time
    :return: Loss value
    """
    # mask=tf.where(y_true,1.,0.)
    mask=tf.cast(y_true,tf.bool)
    mask=tf.cast(mask,tf.float32)
    sym_sum= tf.abs(y_true)+tf.abs(y_pred) 
    condition=tf.cast(sym_sum,tf.bool)
    weights=tf.where(condition,1./( sym_sum + 1e-8),0.0)
    return 200 * tnp.nanmean(tf.abs(y_pred - y_true)*weights * mask)
    # return 200 * tnp.nanmean(tf.abs(y_pred - y_true)*weights )

and my model config

net = NBeatsNet(
        # stack_types=(NBeatsNet.GENERIC_BLOCK, NBeatsNet.GENERIC_BLOCK),
        stack_types=(NBeatsNet.TREND_BLOCK, NBeatsNet.SEASONALITY_BLOCK),
        nb_blocks_per_stack=3,
        forecast_length=outsample_size,
        backcast_length=insample_size,
        hidden_layer_units=512,
        thetas_dim=(4,4),
        share_weights_in_stack=True,
        nb_harmonics=1
        )
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate,
        decay_steps=epoch // 3,
        decay_rate=0.5,
        staircase=True)

net.compile(loss=smape_loss, 
        # optimizer='adam',
        optimizer=tf.keras.optimizers.Adam(
            learning_rate=lr_schedule,
            clipnorm=1.0,
            clipvalue=0.5
        ),
    )

If someone interested I will post the full code here.

fecet commented 2 years ago

[UPDATE] I find the origin paper mentioned that the stop gradient in denominator, but I dont see this trick in his code and that may result in relative bad performance. I will keep investigating it.

fecet commented 2 years ago

The linear space function should use an adapted horizon,

def linear_space(backcast_length, forecast_length, is_forecast=True):
    # ls = K.arange(-float(backcast_length), float(forecast_length), 1) / forecast_length
    # return ls[backcast_length:] if is_forecast else K.abs(K.reverse(ls[:backcast_length], axes=0))
    horizon = forecast_length if is_forecast else backcast_length
    return K.arange(0,horizon)/horizon