nicholas-leonard / dp

A deep learning library for streamlining research and development using the Torch7 distribution.
Other
343 stars 139 forks source link

recurrentlanguagemodel.lua with SentenceSampler #167

Closed ghost closed 8 years ago

ghost commented 9 years ago

I want to replace TextSampler for training in recurrentlanguagemodel.lua with SentenceSampler, l.298, like so:

Replace

and dp.TextSampler{epoch_size = opt.trainEpochSize, batch_size = opt.batchSize}

with

and dp.SentenceSampler{epoch_size = opt.trainEpochSize, batch_size = opt.batchSize}

To the best of my knowledge, given the (kind of sparse) documentation, this should work. But running

th recurrentlanguagemodel.lua --dataset TextSource --trainFile test.txt --validFile test.txt --testFile test.txt --dataPath . --maxEpoch 1 --hiddenSize {5}

results in

==> epoch # 1 for optimizer :
...rch/install/share/lua/5.1/dp/sampler/sentencesampler.lua:47: attempt to call method 'startId' (a nil value)  on dataset :    train
/Users/kmnns/Applications/torch/install/bin/luajit: ...rch/install/share/lua/5.1/dp/sampler/sentencesampler.lua:39: corountine error
stack traceback:
    [C]: in function 'error'
    ...rch/install/share/lua/5.1/dp/sampler/sentencesampler.lua:39: in function 'sampler'
    ...torch/install/share/lua/5.1/dp/propagator/propagator.lua:117: in function 'propagateEpoch'
    ...torch/install/share/lua/5.1/dp/propagator/experiment.lua:110: in function 'run'
    rnnlm.lua:363: in main chunk
    [C]: in function 'dofile'
    ...ions/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x010c848170

test.txt contains one sentence per line, equal lengths. TextSampler works fine.

I cannot figure out how to tackle this problem, or where it actually comes from. Is this a bug or did I miss something?

nicholas-leonard commented 9 years ago

SentenceSampler only works with a SentenceSet, which is what the BillionWords DataSource uses internally. The TextSource uses TextSets internally, so it won't work. To get this to work, you would either need to modify TextSet to support this kind of thing, or pass your text to a SentenceSet in the format it requires (and wrap it in a DataSource). Sorry this doesn't work out of the box.

ghost commented 9 years ago

Thank you for your answer. I'll try it out, post the solution here (if it works) and then close the issue.

Edit: switched to different Torch framework.