Closed tomsbergmanis closed 5 years ago
Hi,
the problem is on our side - the tutorial is outdated. In recent versions
of neural monkey, you need to create the vocabulary file yourself and load
with the vocabulary.from_wordlist
method. An example vocabulary file is
located in this repository e.g. in tests/data/encoder_vocab.tsv
. You can
also use 3rd party tools to create the vocabulary, such as
github.com/google/sentencepiece, but then you need to match the vocabulary
format to neural monkey's (first four lines corresponding to four special
symbols). This could be done using sentencepiece's command line options
--bos-id, --unk-id, --eos-id, and so on...
ne 17. 2. 2019 v 21:45 odesílatel Toms Bergmanis notifications@github.com napsal:
Hi! Thank you for developing neuralmonkey! I am excited to start using it, but I ran into a problem. Following the APE tutorial I have specified: [source_vocabulary] class=vocabulary.from_dataset datasets=[
] series_ids=["source"] max_size=50000 yet I get the following error suggesting that from_dataset does not exist:
[image: capture] https://user-images.githubusercontent.com/3306933/52918895-663a0c00-32f4-11e9-90ce-2a123eb978ab.PNG
What have I done wrong? Thank you in advance! T.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/794, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwcsw8Fxiw2AzdaNNZxURz7wuiTQU1fks5vOb-FgaJpZM4a_x-W .
Cool, thanks. I thought that it might be the case. Thank you for the prompt reply!
On Sun, 17 Feb 2019, 21:20 Jindra Helcl, notifications@github.com wrote:
Hi,
the problem is on our side - the tutorial is outdated. In recent versions of neural monkey, you need to create the vocabulary file yourself and load with the
vocabulary.from_wordlist
method. An example vocabulary file is located in this repository e.g. intests/data/encoder_vocab.tsv
. You can also use 3rd party tools to create the vocabulary, such as github.com/google/sentencepiece, but then you need to match the vocabulary format to neural monkey's (first four lines corresponding to four special symbols). This could be done using sentencepiece's command line options --bos-id, --unk-id, --eos-id, and so on...ne 17. 2. 2019 v 21:45 odesílatel Toms Bergmanis <notifications@github.com
napsal:
Hi! Thank you for developing neuralmonkey! I am excited to start using it, but I ran into a problem. Following the APE tutorial I have specified: [source_vocabulary] class=vocabulary.from_dataset datasets=[
] series_ids=["source"] max_size=50000 yet I get the following error suggesting that from_dataset does not exist:
[image: capture] < https://user-images.githubusercontent.com/3306933/52918895-663a0c00-32f4-11e9-90ce-2a123eb978ab.PNG
What have I done wrong? Thank you in advance! T.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/794, or mute the thread < https://github.com/notifications/unsubscribe-auth/ABwcsw8Fxiw2AzdaNNZxURz7wuiTQU1fks5vOb-FgaJpZM4a_x-W
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/794#issuecomment-464510906, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJ1tc3GlaUKicyJvlREQyvgr8SJBRd2ks5vOceugaJpZM4a_x-W .
Hi! Thank you for developing neuralmonkey! I am excited to start using it, but I ran into a problem. Following the APE tutorial I have specified: [source_vocabulary] class=vocabulary.from_dataset datasets=[]
series_ids=["source"]
max_size=50000
yet I get the following error suggesting that from_dataset does not exist:
What have I done wrong? Thank you in advance! T.