Closed bragilee closed 5 years ago
hey, thanks for the kind words.
The hidden size & kernel in the example are just chosen randomly. I've done no experiments to optimize them for any particular data set.
In the literature, you'll find that these sorts of hyper-parameter choices are often made empirically (i.e. choose some nice round numbers). You want to ensure your model has enough capacity to represent the data you're feeding in, so there are some reasonable lower bounds you can set on possible values.
That said, it's near impossible to determine what particular parameter combinations will lead to optimal performance a priori.
Got it. Thanks for your explanations.
Thanks for your work. I am thinking whether the hidden size and kernel size in your examples are optimal after you test with experiments? Or it depends on our task completely?
Thanks. :)