Closed ghost closed 8 years ago
SentenceSampler only works with a SentenceSet, which is what the BillionWords DataSource uses internally. The TextSource uses TextSets internally, so it won't work. To get this to work, you would either need to modify TextSet to support this kind of thing, or pass your text to a SentenceSet in the format it requires (and wrap it in a DataSource). Sorry this doesn't work out of the box.
Thank you for your answer. I'll try it out, post the solution here (if it works) and then close the issue.
Edit: switched to different Torch framework.
I want to replace
TextSampler
for training in recurrentlanguagemodel.lua withSentenceSampler
, l.298, like so:Replace
with
To the best of my knowledge, given the (kind of sparse) documentation, this should work. But running
results in
test.txt
contains one sentence per line, equal lengths.TextSampler
works fine.I cannot figure out how to tackle this problem, or where it actually comes from. Is this a bug or did I miss something?