Closed iamsarthakk closed 1 year ago
Seems to be an unicode issue with your text. Try replacing line 91 in train.py:
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
with
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length, "ISO-8859-1")
Getting the following message while training:
File "train.py", line 179, in main() File "train.py", line 76, in main train(args) File "train.py", line 91, in train data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length) File "/spell/training-lstm/utils.py", line 21, in init self.preprocess(input_file, vocab_file, tensor_file) File "/spell/training-lstm/utils.py", line 30, in preprocess data = f.read() File "/usr/lib/python3.5/codecs.py", line 698, in read return self.reader.read(size) File "/usr/lib/python3.5/codecs.py", line 501, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 7047: invalid start byte