jcjohnson / torch-rnn

Efficient, reusable RNNs and LSTMs for torch
MIT License
2.5k stars 507 forks source link

Explanation of hyperparameters? #75

Open atomicthumbs opened 8 years ago

atomicthumbs commented 8 years ago

Howdy. As someone who is completely new to machine learning, I'm enjoying the process, but trying to generate coherent things off some types of text has left me stumped.

With a movie script, I can generate a decent facsimile of a script that works well for my purposes. However, when I feed torch-rnn a bunch of forum posts around the same length (~200kb), using the same hyperparamters, I come out with a string of words (some of which are real words) and no punctuation or sentence structure. I assume that some of this is because the structure of the script is VERY regular and thus easily learned.

A more detailed explanation of the hyperparameters in flags.md, and how they might be useful when feeding different sorts of training data into torch-rnn, would be very helpful. I set up a really crappy shell script to do a search across some of them but it's a pain in the ass and my GPU is somewhat slow.

atomicthumbs commented 8 years ago

Perhaps also of note is that I was using another neural network framework, Darknet, before I installed Linux and switched to torch-rnn (to use OpenCL). The hyperparameters for that seemed to work well for what I'm doing, but I'm not really sure how to translate them to torch-rnn.

I'm an artist, not a coder, so please go easy on me :v

robinsloan commented 8 years ago

I think training data volume is your culprit. ~200kb is just not enough, especially because, as you say, the format isn't nearly as regular as what you find in a movie script. If you could gather 5X as many forum posts, I think you'd see better results, even just using the default hyperparameters. Lacking that, I don't think there's any combination of hyperparameters that will drastically improve the situation.

I've found that data volume REALLY makes a difference. I've gotten my best results, by far, from a model trained on a ~150MB corpus -- even though it's quite "noisy," with lots of OCR artifacts, etc.

AlekzNet commented 8 years ago

@robinsloan how big is the model you are using for the 150MB corpus?

robinsloan commented 8 years ago

@AlekzNet Currently using: sequence length 64, 2 layers, 1024 hidden units per layer. I also got good results with 512 per layer.

AlekzNet commented 8 years ago

What learning rate are you using for the 1024 model? I have problems with models with more than 512 hidden cells per layer. Even if the 1st epoch looks OK, the second destroys the brain completely.

robinsloan commented 8 years ago

Ha! "The second destroys the brains completely" sounds like you're talking about some gnarly experiment gone horribly wrong. I love it.

I use the default earning rate (2e-3) and it seems to work well. I end up with a training loss of ~0.99, validation loss of ~1.06, and, I guess more importantly, the sampled text is qualitatively good and works great for my purposes.

I'm covetous of https://github.com/maxpumperla/hyperas, though -- it would be great to have something similarly structured for hyperparameter tuning in torch-rnn or Torch generally.

AlekzNet commented 8 years ago

I'm experimenting with the learning rate and trying to come up with the highest rate for a particular model. For example, what can be perfectly suitable for 128 neurons per layer, will "destroy" anything with more than, say, 700. And the results are quite unpredictable (see https://github.com/jcjohnson/torch-rnn/issues/65 and https://github.com/jcjohnson/torch-rnn/issues/67). The loss may tell you nothing. In many cases it rapidly decreases to around 1.10, but sampling either fails with various errors, or you get complete garbage, like randomly repeating a couple of letters and spaces. Or the iterations become 10 times slower, etc. Fun! ;)