ufal / neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.
BSD 3-Clause "New" or "Revised" License
410 stars 104 forks source link

from_dataset does not exist #794

Closed tomsbergmanis closed 5 years ago

tomsbergmanis commented 5 years ago

Hi! Thank you for developing neuralmonkey! I am excited to start using it, but I ran into a problem. Following the APE tutorial I have specified: [source_vocabulary] class=vocabulary.from_dataset datasets=[] series_ids=["source"] max_size=50000

yet I get the following error suggesting that from_dataset does not exist:

capture

What have I done wrong? Thank you in advance! T.

jindrahelcl commented 5 years ago

Hi,

the problem is on our side - the tutorial is outdated. In recent versions of neural monkey, you need to create the vocabulary file yourself and load with the vocabulary.from_wordlist method. An example vocabulary file is located in this repository e.g. in tests/data/encoder_vocab.tsv. You can also use 3rd party tools to create the vocabulary, such as github.com/google/sentencepiece, but then you need to match the vocabulary format to neural monkey's (first four lines corresponding to four special symbols). This could be done using sentencepiece's command line options --bos-id, --unk-id, --eos-id, and so on...

ne 17. 2. 2019 v 21:45 odesílatel Toms Bergmanis notifications@github.com napsal:

Hi! Thank you for developing neuralmonkey! I am excited to start using it, but I ran into a problem. Following the APE tutorial I have specified: [source_vocabulary] class=vocabulary.from_dataset datasets=[] series_ids=["source"] max_size=50000

yet I get the following error suggesting that from_dataset does not exist:

[image: capture] https://user-images.githubusercontent.com/3306933/52918895-663a0c00-32f4-11e9-90ce-2a123eb978ab.PNG

What have I done wrong? Thank you in advance! T.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/794, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwcsw8Fxiw2AzdaNNZxURz7wuiTQU1fks5vOb-FgaJpZM4a_x-W .

tomsbergmanis commented 5 years ago

Cool, thanks. I thought that it might be the case. Thank you for the prompt reply!

On Sun, 17 Feb 2019, 21:20 Jindra Helcl, notifications@github.com wrote:

Hi,

the problem is on our side - the tutorial is outdated. In recent versions of neural monkey, you need to create the vocabulary file yourself and load with the vocabulary.from_wordlist method. An example vocabulary file is located in this repository e.g. in tests/data/encoder_vocab.tsv. You can also use 3rd party tools to create the vocabulary, such as github.com/google/sentencepiece, but then you need to match the vocabulary format to neural monkey's (first four lines corresponding to four special symbols). This could be done using sentencepiece's command line options --bos-id, --unk-id, --eos-id, and so on...

ne 17. 2. 2019 v 21:45 odesílatel Toms Bergmanis <notifications@github.com

napsal:

Hi! Thank you for developing neuralmonkey! I am excited to start using it, but I ran into a problem. Following the APE tutorial I have specified: [source_vocabulary] class=vocabulary.from_dataset datasets=[] series_ids=["source"] max_size=50000

yet I get the following error suggesting that from_dataset does not exist:

[image: capture] < https://user-images.githubusercontent.com/3306933/52918895-663a0c00-32f4-11e9-90ce-2a123eb978ab.PNG

What have I done wrong? Thank you in advance! T.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/794, or mute the thread < https://github.com/notifications/unsubscribe-auth/ABwcsw8Fxiw2AzdaNNZxURz7wuiTQU1fks5vOb-FgaJpZM4a_x-W

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/794#issuecomment-464510906, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJ1tc3GlaUKicyJvlREQyvgr8SJBRd2ks5vOceugaJpZM4a_x-W .