karpathy / char-rnn

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
11.53k stars 2.58k forks source link

CR line terminators affecting training #171

Open gregorykan opened 8 years ago

gregorykan commented 8 years ago

Hi y'all!

Not sure if this is an obvious thing to you, as I am very green to programming. It took me a long time to understand why training on a certain text file was so quick, and only returning a single line, no matter what I set length to for sampling.

The reason was that my input text had CR line breaks:

input_with_cr.txt:    ASCII English text, with CR line terminators

I used the simple tr command:

tr '\r' '\n' < input_with_cr.txt > input.txt

This yielded:

input.txt:            ASCII English text
input_with_cr.txt:    ASCII English text, with CR line terminators

Training is looking good now! Just a small gotcha.