The conversion fails as the data/glove/glove.840B.300d.txt file has non-UTF8 and Non-ASCII characters. Did anyone face this issue with data/glove/glove.840B.300d.txt file?
Changed the for loop to add the condition to avoid conversion to number and writing to vocab, if the second token in each count is not a number.(the problem is because if there is no number, then tonumber returns nil)
for i = 1, count do
repeat
xlua.progress(i, count)
local tokens = stringx.split(file:read())
if tonumber(tokens[2]) == nil then break end
local word = tokens[1]
vocab:write(word .. '\n')
for j = 1, dim do
vecs[{i, j}] = tonumber(tokens[j + 1])
end
until true
end
The above fix solves the issue, but I would like to know if this is the correct solution for the problem.
Error occurs in convert-wordvecs.lua at
vecs[{i, j}] = tonumber(tokens[j + 1])
The conversion fails as the data/glove/glove.840B.300d.txt file has non-UTF8 and Non-ASCII characters. Did anyone face this issue with data/glove/glove.840B.300d.txt file?
Changed the for loop to add the condition to avoid conversion to number and writing to vocab, if the second token in each count is not a number.(the problem is because if there is no number, then tonumber returns nil)
The above fix solves the issue, but I would like to know if this is the correct solution for the problem.