speechLabBcCuny / messlJsalt15

MESSL wrappers etc for JSALT 2015, including CHiME3
7 stars 7 forks source link

Hyper Parameter Exploration #14

Open grezesf opened 7 years ago

grezesf commented 7 years ago

Run experiments constantly, exploring the hyper-parameter space: parameters and scope to be determined shortly. (possibly use hyperas: https://github.com/maxpumperla/hyperas)

grezesf commented 7 years ago

I've (finally) gotten hyper-parameter search to work. Here is the possible search space. Before I launch it on the server, which should I remove?

hyper-parameter space

    # size of LSTM output [256,512,1024,2048])
    # make LSTM bidirectional or not
    # number of LSTM layers [1,2,3,more?]
    # merge_mode for bidirectional LSTM ['sum', 'mul', 'concat', 'ave', None]
    # activation function of output Dense layer: [softmax, softplus, softsign, relu, tanh, sigmoid, hard_sigmoid, linear]
    # loss for whole model: ['mean_squared_error / mse', 'mean_absolute_error / mae', 'mean_absolute_percentage_error / mape', 'mean_squared_logarithmic_error / msle'
    # (continued) squared_hinge, hinge, binary_crossentropy, kullback_leibler_divergence, poisson, cosine_proximity]
    # optimizer [SGD, RMSprop, AdaGrad, AdaDelta, Adam, Adam, Adamax, Nadam]
    # 'batch_size [32, 64, 128, 256, 512]
mim commented 7 years ago

What does no merge for bidirectional LSTM mean?

For the loss for the whole model, I thought we were either using the mask-aware loss or the phase-aware loss, right?

And for the activation function of the output, if it is predicting a mask, it should be sigmoid.

The other parameters look good for searching.

grezesf commented 7 years ago
  1. "If None, the outputs will not be combined, they will be returned as a list." (I have to admit I'm not 100% clear on how bidirectional networks function)
  2. The model right now is mask-aware, but I guess there is more than 1 way to compute a loss between a predicted and target mask. MSE corresponds to the Erdogan paper.
  3. I'll restrict to sigmoid and hard sigmoid
mim commented 7 years ago
  1. Try the other combinations, but not None
  2. I think the loss between the predicted and target mask should be cross entropy and the loss between the masked noisy speech and clean speech should be MSE.
  3. Sounds good.