karpathy / recurrentjs

Deep Recurrent Neural Networks and LSTMs in Javascript. More generally also arbitrary expression graphs with automatic differentiation.
939 stars 184 forks source link

text prediction #4

Open hardmaru opened 9 years ago

hardmaru commented 9 years ago

Hi Karparthy

I have trouble understanding a few points in the character prediction demo:

1) What is the meaning of letter_size? letter_size = 5; // size of letter embeddings

My understanding is that the inputs to the network are just vectors of length 50 (or however many unique characters in our dataset) that look like [0, 0, 0, 0, 1, 0, 0, 0 ... ], and the output is similar

2) to tick the model forward, you used 'rowPluck' x = G.rowPluck(model['Wil'], ix);

and I think ix is the integer index that represents the character. I exampled x in inspector and is a value of floats of length letter_size, rather than a large binary vector of length 50

So I have a little bit of a hard time what is going on and currently quite puzzled. Any advice or guidance appreciated!

Thanks

David

karpathy commented 9 years ago

Hey David, sorry. recurrentjs was not really meant for production or cleanliness, it's a "are you neural nets expert? ok here's some dump of code you might like" kind of thing.

The characters are encoded in 1ofk for 50, but then there's a linear transformation 50x5 operating over that. When you look at the math this is basically equivalent to plucking a row of the 50x5 matrix (since all elements except one are 0). So effectively letter_size is the dimension of the "embedding space" that each character occupies before it's fed in to the net

hardmaru commented 9 years ago

I find the code quite neat and okay-readable and I'm able to learn from it after playing around with it

I see, so basically we transform a one-hot vector of size 50 (or whatever) into a vector of 5 floats before feeding it in

I was trying to implement something similar from scratch but got stuck after the output remains to be a bit on the gibberish side after many generations so I wanted take a look at this code for some guidance. Maybe this form of densed representation rather than 1-hot would help improve the performance

Thanks again

gustavofuhr commented 7 years ago

In a similar note @karpathy, it seems that you used hot-encoded inputs, is that right? Why not encode in a single 1 to N integer and leave the hot-encoded stuff for the output?

ericlovesmath commented 7 years ago

@karpathy Where is the code for the online demo of the text prediction program?