snukky / news-translit-nmt

Training scripts and instructions how to reproduce our systems submitted to the NEWS 2018 Task on Transliteration of Named Entities: R. Grundkiewicz, K. Heafield: Neural Machine Translation Techniques for Named Entity Transliteration, NEWS 2018, ACL
MIT License
4 stars 2 forks source link

Error: File './data/EnHe.train.src' is empty #2

Closed yonatanbitton closed 5 years ago

yonatanbitton commented 5 years ago

Hello. I've successfully prepared the data:

jon@jon:~/PycharmProjects/news-translit-nmt-master/experiments$  bash prepare-data.sh
Preparing ThEn devset
Preparing EnTh devset
Preparing PeEn devset
Preparing EnPe devset
Preparing ChEn devset
Preparing EnCh devset
Preparing EnVi devset
Preparing HeEn devset
Preparing EnBa devset
Preparing EnHi devset
Preparing EnKa devset
Preparing EnTa devset
Preparing EnHe devset
Preparing ThEn trainset
Preparing EnTh trainset
Preparing PeEn trainset
Preparing EnPe trainset
Preparing ChEn trainset
Preparing EnCh trainset
Preparing EnVi trainset
Preparing HeEn trainset
Preparing EnBa trainset
Preparing EnHi trainset
Preparing EnKa trainset
Preparing EnTa trainset
Preparing EnHe trainset
Extra data for EnCh/ChEn
Extra data for EnTh/ThEn
Extra data for EnPe/PeEn
Extra data for EnHe/HeEn
[2019-08-14 11:33:18] Creating vocabulary...
[2019-08-14 11:33:18] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:18] Finished
[2019-08-14 11:33:18] Creating vocabulary...
[2019-08-14 11:33:18] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:18] Finished
[2019-08-14 11:33:18] Creating vocabulary...
[2019-08-14 11:33:18] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:18] Finished
[2019-08-14 11:33:18] Creating vocabulary...
[2019-08-14 11:33:18] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:18] Finished
[2019-08-14 11:33:18] Creating vocabulary...
[2019-08-14 11:33:18] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:18] Finished
[2019-08-14 11:33:18] Creating vocabulary...
[2019-08-14 11:33:18] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:18] Finished
[2019-08-14 11:33:18] Creating vocabulary...
[2019-08-14 11:33:18] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:18] Finished
[2019-08-14 11:33:18] Creating vocabulary...
[2019-08-14 11:33:18] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:18] Finished
[2019-08-14 11:33:18] Creating vocabulary...
[2019-08-14 11:33:18] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:19] Finished
[2019-08-14 11:33:19] Creating vocabulary...
[2019-08-14 11:33:19] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:19] Finished
[2019-08-14 11:33:19] Creating vocabulary...
[2019-08-14 11:33:19] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:19] Finished
[2019-08-14 11:33:19] Creating vocabulary...
[2019-08-14 11:33:19] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:19] Finished
[2019-08-14 11:33:19] Creating vocabulary...
[2019-08-14 11:33:19] [data] Creating vocabulary stdout from stdin
[2019-08-14 11:33:19] Finished

And now I want to run the training of English to Hebrew. So I execute: bash train.sh '0 1' EnHe

And getting the following error:

[2019-08-14 11:35:06] [marian] Marian v1.7.8 c65c26d 2019-08-11 18:27:00 +0100
[2019-08-14 11:35:06] [marian] Running on jon as process 24103 with command line:
[2019-08-14 11:35:06] [marian] ../tools/marian-dev/build/marian --devices 0 1 --model ./models/EnHe.1/model.npz --type s2s --train-sets ./data/EnHe.train.src ./data/EnHe.train.trg --vocabs ./models/EnHe.1/vocab.yml ./models/EnHe.1/vocab.yml --sqlite ./models/EnHe.1/corpus.sqlite3 --max-length 80 --mini-batch-fit -w 3000 --mini-batch 100 --maxi-batch 1000 --best-deep --dropout-rnn 0.2 --dropout-src 0.2 --dropout-trg 0.1 --tied-embeddings-all --layer-normalization --exponential-smoothing --learn-rate 0.0001 --lr-decay 0.8 --lr-decay-strategy stalled --lr-decay-start 1 --lr-report --valid-freq 500 --save-freq 2000 --disp-freq 100 --valid-metrics ce-mean-words translation --valid-translation-output ./models/EnHe.1/dev.out --quiet-translation --valid-sets ./data/EnHe.valid.src ./data/EnHe.valid.trg --valid-script-path ./models/EnHe.1/validate.sh --valid-mini-batch 64 --beam-size 10 --normalize 1.0 --early-stopping 10 --cost-type ce-mean-words --overwrite --keep-best --log ./models/EnHe.1/train.log --valid-log ./models/EnHe.1/valid.log
[2019-08-14 11:35:06] [config] after-batches: 0
[2019-08-14 11:35:06] [config] after-epochs: 0
[2019-08-14 11:35:06] [config] allow-unk: false
[2019-08-14 11:35:06] [config] beam-size: 10
[2019-08-14 11:35:06] [config] bert-class-symbol: "[CLS]"
[2019-08-14 11:35:06] [config] bert-mask-symbol: "[MASK]"
[2019-08-14 11:35:06] [config] bert-masking-fraction: 0.15
[2019-08-14 11:35:06] [config] bert-sep-symbol: "[SEP]"
[2019-08-14 11:35:06] [config] bert-train-type-embeddings: true
[2019-08-14 11:35:06] [config] bert-type-vocab-size: 2
[2019-08-14 11:35:06] [config] clip-gemm: 0
[2019-08-14 11:35:06] [config] clip-norm: 5
[2019-08-14 11:35:06] [config] cost-type: ce-mean-words
[2019-08-14 11:35:06] [config] cpu-threads: 0
[2019-08-14 11:35:06] [config] data-weighting: ""
[2019-08-14 11:35:06] [config] data-weighting-type: sentence
[2019-08-14 11:35:06] [config] dec-cell: gru
[2019-08-14 11:35:06] [config] dec-cell-base-depth: 4
[2019-08-14 11:35:06] [config] dec-cell-high-depth: 2
[2019-08-14 11:35:06] [config] dec-depth: 4
[2019-08-14 11:35:06] [config] devices:
[2019-08-14 11:35:06] [config]   - 0
[2019-08-14 11:35:06] [config]   - 1
[2019-08-14 11:35:06] [config] dim-emb: 512
[2019-08-14 11:35:06] [config] dim-rnn: 1024
[2019-08-14 11:35:06] [config] dim-vocabs:
[2019-08-14 11:35:06] [config]   - 0
[2019-08-14 11:35:06] [config]   - 0
[2019-08-14 11:35:06] [config] disp-first: 0
[2019-08-14 11:35:06] [config] disp-freq: 100
[2019-08-14 11:35:06] [config] disp-label-counts: false
[2019-08-14 11:35:06] [config] dropout-rnn: 0.2
[2019-08-14 11:35:06] [config] dropout-src: 0.2
[2019-08-14 11:35:06] [config] dropout-trg: 0.1
[2019-08-14 11:35:06] [config] dump-config: ""
[2019-08-14 11:35:06] [config] early-stopping: 10
[2019-08-14 11:35:06] [config] embedding-fix-src: false
[2019-08-14 11:35:06] [config] embedding-fix-trg: false
[2019-08-14 11:35:06] [config] embedding-normalization: false
[2019-08-14 11:35:06] [config] embedding-vectors:
[2019-08-14 11:35:06] [config]   []
[2019-08-14 11:35:06] [config] enc-cell: gru
[2019-08-14 11:35:06] [config] enc-cell-depth: 2
[2019-08-14 11:35:06] [config] enc-depth: 4
[2019-08-14 11:35:06] [config] enc-type: alternating
[2019-08-14 11:35:06] [config] exponential-smoothing: 0.0001
[2019-08-14 11:35:06] [config] grad-dropping-momentum: 0
[2019-08-14 11:35:06] [config] grad-dropping-rate: 0
[2019-08-14 11:35:06] [config] grad-dropping-warmup: 100
[2019-08-14 11:35:06] [config] guided-alignment: none
[2019-08-14 11:35:06] [config] guided-alignment-cost: mse
[2019-08-14 11:35:06] [config] guided-alignment-weight: 0.1
[2019-08-14 11:35:06] [config] ignore-model-config: false
[2019-08-14 11:35:06] [config] input-types:
[2019-08-14 11:35:06] [config]   []
[2019-08-14 11:35:06] [config] interpolate-env-vars: false
[2019-08-14 11:35:06] [config] keep-best: true
[2019-08-14 11:35:06] [config] label-smoothing: 0.1
[2019-08-14 11:35:06] [config] layer-normalization: true
[2019-08-14 11:35:06] [config] learn-rate: 0.0001
[2019-08-14 11:35:06] [config] log: ./models/EnHe.1/train.log
[2019-08-14 11:35:06] [config] log-level: info
[2019-08-14 11:35:06] [config] log-time-zone: ""
[2019-08-14 11:35:06] [config] lr-decay: 0.8
[2019-08-14 11:35:06] [config] lr-decay-freq: 50000
[2019-08-14 11:35:06] [config] lr-decay-inv-sqrt:
[2019-08-14 11:35:06] [config]   - 16000
[2019-08-14 11:35:06] [config] lr-decay-repeat-warmup: false
[2019-08-14 11:35:06] [config] lr-decay-reset-optimizer: false
[2019-08-14 11:35:06] [config] lr-decay-start:
[2019-08-14 11:35:06] [config]   - 1
[2019-08-14 11:35:06] [config] lr-decay-strategy: stalled
[2019-08-14 11:35:06] [config] lr-report: true
[2019-08-14 11:35:06] [config] lr-warmup: 0
[2019-08-14 11:35:06] [config] lr-warmup-at-reload: false
[2019-08-14 11:35:06] [config] lr-warmup-cycle: false
[2019-08-14 11:35:06] [config] lr-warmup-start-rate: 0
[2019-08-14 11:35:06] [config] max-length: 80
[2019-08-14 11:35:06] [config] max-length-crop: false
[2019-08-14 11:35:06] [config] max-length-factor: 3
[2019-08-14 11:35:06] [config] maxi-batch: 1000
[2019-08-14 11:35:06] [config] maxi-batch-sort: trg
[2019-08-14 11:35:06] [config] mini-batch: 100
[2019-08-14 11:35:06] [config] mini-batch-fit: true
[2019-08-14 11:35:06] [config] mini-batch-fit-step: 10
[2019-08-14 11:35:06] [config] mini-batch-overstuff: 1
[2019-08-14 11:35:06] [config] mini-batch-track-lr: false
[2019-08-14 11:35:06] [config] mini-batch-understuff: 1
[2019-08-14 11:35:06] [config] mini-batch-warmup: 0
[2019-08-14 11:35:06] [config] mini-batch-words: 0
[2019-08-14 11:35:06] [config] mini-batch-words-ref: 0
[2019-08-14 11:35:06] [config] model: ./models/EnHe.1/model.npz
[2019-08-14 11:35:06] [config] multi-loss-type: sum
[2019-08-14 11:35:06] [config] multi-node: false
[2019-08-14 11:35:06] [config] multi-node-overlap: true
[2019-08-14 11:35:06] [config] n-best: false
[2019-08-14 11:35:06] [config] no-nccl: false
[2019-08-14 11:35:06] [config] no-reload: false
[2019-08-14 11:35:06] [config] no-restore-corpus: false
[2019-08-14 11:35:06] [config] no-shuffle: false
[2019-08-14 11:35:06] [config] normalize: 1
[2019-08-14 11:35:06] [config] num-devices: 0
[2019-08-14 11:35:06] [config] optimizer: adam
[2019-08-14 11:35:06] [config] optimizer-delay: 1
[2019-08-14 11:35:06] [config] optimizer-params:
[2019-08-14 11:35:06] [config]   []
[2019-08-14 11:35:06] [config] overwrite: true
[2019-08-14 11:35:06] [config] pretrained-model: ""
[2019-08-14 11:35:06] [config] quiet: false
[2019-08-14 11:35:06] [config] quiet-translation: true
[2019-08-14 11:35:06] [config] relative-paths: false
[2019-08-14 11:35:06] [config] right-left: false
[2019-08-14 11:35:06] [config] save-freq: 2000
[2019-08-14 11:35:06] [config] seed: 0
[2019-08-14 11:35:06] [config] shuffle-in-ram: false
[2019-08-14 11:35:06] [config] skip: true
[2019-08-14 11:35:06] [config] sqlite: ./models/EnHe.1/corpus.sqlite3
[2019-08-14 11:35:06] [config] sqlite-drop: false
[2019-08-14 11:35:06] [config] sync-sgd: true
[2019-08-14 11:35:06] [config] tempdir: /tmp
[2019-08-14 11:35:06] [config] tied-embeddings: true
[2019-08-14 11:35:06] [config] tied-embeddings-all: true
[2019-08-14 11:35:06] [config] tied-embeddings-src: false
[2019-08-14 11:35:06] [config] train-sets:
[2019-08-14 11:35:06] [config]   - ./data/EnHe.train.src
[2019-08-14 11:35:06] [config]   - ./data/EnHe.train.trg
[2019-08-14 11:35:06] [config] transformer-aan-activation: swish
[2019-08-14 11:35:06] [config] transformer-aan-depth: 2
[2019-08-14 11:35:06] [config] transformer-aan-nogate: false
[2019-08-14 11:35:06] [config] transformer-decoder-autoreg: self-attention
[2019-08-14 11:35:06] [config] transformer-dim-aan: 2048
[2019-08-14 11:35:06] [config] transformer-dim-ffn: 2048
[2019-08-14 11:35:06] [config] transformer-dropout: 0
[2019-08-14 11:35:06] [config] transformer-dropout-attention: 0
[2019-08-14 11:35:06] [config] transformer-dropout-ffn: 0
[2019-08-14 11:35:06] [config] transformer-ffn-activation: swish
[2019-08-14 11:35:06] [config] transformer-ffn-depth: 2
[2019-08-14 11:35:06] [config] transformer-guided-alignment-layer: last
[2019-08-14 11:35:06] [config] transformer-heads: 8
[2019-08-14 11:35:06] [config] transformer-no-projection: false
[2019-08-14 11:35:06] [config] transformer-postprocess: dan
[2019-08-14 11:35:06] [config] transformer-postprocess-emb: d
[2019-08-14 11:35:06] [config] transformer-preprocess: ""
[2019-08-14 11:35:06] [config] transformer-tied-layers:
[2019-08-14 11:35:06] [config]   []
[2019-08-14 11:35:06] [config] transformer-train-position-embeddings: false
[2019-08-14 11:35:06] [config] type: s2s
[2019-08-14 11:35:06] [config] ulr: false
[2019-08-14 11:35:06] [config] ulr-dim-emb: 0
[2019-08-14 11:35:06] [config] ulr-dropout: 0
[2019-08-14 11:35:06] [config] ulr-keys-vectors: ""
[2019-08-14 11:35:06] [config] ulr-query-vectors: ""
[2019-08-14 11:35:06] [config] ulr-softmax-temperature: 1
[2019-08-14 11:35:06] [config] ulr-trainable-transformation: false
[2019-08-14 11:35:06] [config] valid-freq: 500
[2019-08-14 11:35:06] [config] valid-log: ./models/EnHe.1/valid.log
[2019-08-14 11:35:06] [config] valid-max-length: 1000
[2019-08-14 11:35:06] [config] valid-metrics:
[2019-08-14 11:35:06] [config]   - ce-mean-words
[2019-08-14 11:35:06] [config]   - translation
[2019-08-14 11:35:06] [config] valid-mini-batch: 64
[2019-08-14 11:35:06] [config] valid-script-path: ./models/EnHe.1/validate.sh
[2019-08-14 11:35:06] [config] valid-sets:
[2019-08-14 11:35:06] [config]   - ./data/EnHe.valid.src
[2019-08-14 11:35:06] [config]   - ./data/EnHe.valid.trg
[2019-08-14 11:35:06] [config] valid-translation-output: ./models/EnHe.1/dev.out
[2019-08-14 11:35:06] [config] vocabs:
[2019-08-14 11:35:06] [config]   - ./models/EnHe.1/vocab.yml
[2019-08-14 11:35:06] [config]   - ./models/EnHe.1/vocab.yml
[2019-08-14 11:35:06] [config] word-penalty: 0
[2019-08-14 11:35:06] [config] workspace: 3000
[2019-08-14 11:35:06] [config] Model is being created with Marian v1.7.8 c65c26d 2019-08-11 18:27:00 +0100
[2019-08-14 11:35:06] Using synchronous training
[2019-08-14 11:35:06] [data] Loading vocabulary from JSON/Yaml file ./models/EnHe.1/vocab.yml
[2019-08-14 11:35:06] [data] Setting vocabulary size for input 0 to 2
[2019-08-14 11:35:06] [data] Loading vocabulary from JSON/Yaml file ./models/EnHe.1/vocab.yml
[2019-08-14 11:35:06] [data] Setting vocabulary size for input 1 to 2
[2019-08-14 11:35:06] Error: File './data/EnHe.train.src' is empty
[2019-08-14 11:35:06] Error: Aborted from marian::data::CorpusBase::CorpusBase(marian::Ptr<marian::Options>, bool) in /home/jon/PycharmProjects/news-translit-nmt-master/tools/marian-dev/src/data/corpus_base.cpp:159

[CALL STACK]
[0x7252e2]          marian::data::CorpusBase::  CorpusBase  (std::shared_ptr<marian::Options>,  bool) + 0x1182
[0x7424cb]          marian::data::CorpusSQLite::  CorpusSQLite  (std::shared_ptr<marian::Options>,  bool) + 0x4b
[0x6184fd]          std::shared_ptr<marian::data::CorpusSQLite> marian::  New  <marian::data::CorpusSQLite,std::shared_ptr<marian::Options>&>(std::shared_ptr<marian::Options>&) + 0x3d
[0x6702b8]          marian::Train<marian::SyncGraphGroup>::  run  ()   + 0xb8
[0x5a0cd9]          mainTrainer  (int,  char**)                        + 0x2c9
[0x57e77a]          main                                               + 0x8a
[0x7fc1e4048830]    __libc_start_main                                  + 0xf0
[0x59e4f9]          _start                                             + 0x29

train-model.sh: line 49: 24103 Aborted                 (core dumped) $MARIAN/marian --devices $GPUS $OPTIONS --model $MODEL/model.npz --type s2s --train-sets $DATA/$LANGS.train.{src,trg} --vocabs $MODEL/vocab.yml $MODEL/vocab.yml --sqlite $MODEL/corpus.sqlite3 --max-length 80 --mini-batch-fit -w 3000 --mini-batch 100 --maxi-batch 1000 --best-deep --dropout-rnn 0.2 --dropout-src 0.2 --dropout-trg 0.1 --tied-embeddings-all --layer-normalization --exponential-smoothing --learn-rate 0.0001 --lr-decay 0.8 --lr-decay-strategy stalled --lr-decay-start 1 --lr-report --valid-freq 500 --save-freq 2000 --disp-freq 100 --valid-metrics ce-mean-words translation --valid-translation-output $MODEL/dev.out --quiet-translation --valid-sets $DATA/$LANGS.valid.{src,trg} --valid-script-path $MODEL/validate.sh --valid-mini-batch 64 --beam-size 10 --normalize 1.0 --early-stopping 10 --cost-type ce-mean-words --overwrite --keep-best --log $MODEL/train.log --valid-log $MODEL/valid.log

And indeed when I look on the train.src files, some are not emplty (EnCh), and some are empty (EnHe).

-rw-rw-r-- 1 jon jon 2.2M אוג 14 11:33 EnCh.train.src
-rw-rw-r-- 1 jon jon    0 אוג 14 11:33 EnCh.train.src.filtered
-rw-rw-r-- 1 jon jon 2.1M אוג 14 11:33 EnCh.train.trg
-rw-rw-r-- 1 jon jon 753K אוג 14 11:22 EnCh.train.txt
-rw-rw-r-- 1 jon jon 7.1K אוג 14 11:22 EnCh.valid.src
-rw-rw-r-- 1 jon jon 4.1K אוג 14 11:22 EnCh.valid.src.orig
-rw-rw-r-- 1 jon jon 6.6K אוג 14 11:22 EnCh.valid.trg
-rw-rw-r-- 1 jon jon 5.4K אוג 14 11:22 EnCh.valid.trg.orig
-rw-rw-r-- 1 jon jon 9.4K אוג 14 11:22 EnCh.valid.txt
-rw-rw-r-- 1 jon jon  50K אוג 14 11:22 EnCh.valid.xml
-rw-rw-r-- 1 jon jon 5.7K אוג 14 11:33 EnCh.vocab.yml
-rw-rw-r-- 1 jon jon    0 אוג 14 10:22 EnHe.dev.src
-rw-rw-r-- 1 jon jon 7.4K אוג 14 10:22 EnHe.dev.src.orig
-rw-rw-r-- 1 jon jon  20K אוג 14 10:22 EnHe.dev.txt
-rw-rw-r-- 1 jon jon 109K אוג 14 10:22 EnHe.dev.xml
-rw-rw-r-- 1 jon jon    0 אוג 14 11:33 EnHe.filter.src
-rw-rw-r-- 1 jon jon    0 אוג 14 11:33 EnHe.train.extra.txt
-rw-rw-r-- 1 jon jon    0 אוג 14 10:22 EnHe.train.src
-rw-rw-r-- 1 jon jon    0 אוג 14 11:33 EnHe.train.src.filtered
-rw-rw-r-- 1 jon jon    0 אוג 14 10:22 EnHe.train.trg
-rw-rw-r-- 1 jon jon 195K אוג 14 10:22 EnHe.train.txt
-rw-rw-r-- 1 jon jon    0 אוג 14 10:22 EnHe.valid.src

How can I solve it?

I really want to be able to run your project, I want to cite you in my article :)

Thanks

snukky commented 5 years ago

Does your datasets directory contain all the files as shown here? I have added error exit flag to the script, so please run it again and check when it stops.

snukky commented 5 years ago

Your EnHe.dev.src is empty, which suggests an issue with ./tools/tokenize.py.

Make sure that you delete the data directory before running the script again.

yonatanbitton commented 5 years ago

I've deleted the project, cloned again, and performed prepare-data.sh.

This is the output:

jon@jon:~/PycharmProjects/news-translit-nmt-master/experiments$  ./prepare-data.sh
#!/bin/bash -v

set -eo pipefail

dataset_dev=../datasets/NEWS2018_DATASET_0?/*_dev.xml
dataset_train=../datasets/NEWS2018_DATASET_0?/*_trn.xml

mkdir -p data

for path in $dataset_dev; do
    lang=`basename $path | cut -c12-15`
    file=data/$lang

    echo "Preparing $lang devset"

    test -e $file.dev.xml || cp $path $file.dev.xml
    test -e $file.dev.txt || python3 ../tools/extract-xml.py --one-target < $file.dev.xml > $file.dev.txt
    test -e $file.dev.src.orig || cut -f1 $file.dev.txt > $file.dev.src.orig
    test -e $file.dev.src || cut -f1 $file.dev.txt | python ../tools/tokenize.py > $file.dev.src
done
basename $path | cut -c12-15
Preparing ThEn devset
basename $path | cut -c12-15
Preparing EnTh devset
basename $path | cut -c12-15
Preparing PeEn devset
basename $path | cut -c12-15
Preparing EnPe devset
basename $path | cut -c12-15
Preparing ChEn devset
basename $path | cut -c12-15
Preparing EnCh devset
basename $path | cut -c12-15
Preparing EnVi devset
basename $path | cut -c12-15
Preparing HeEn devset
basename $path | cut -c12-15
Preparing EnBa devset
basename $path | cut -c12-15
Preparing EnHi devset
basename $path | cut -c12-15
Preparing EnKa devset
basename $path | cut -c12-15
Preparing EnTa devset
basename $path | cut -c12-15
Preparing EnHe devset

for path in $dataset_train; do
    lang=`basename $path | cut -c12-15`
    file=data/$lang

    echo "Preparing $lang trainset"

    test -e $file.train.txt || python3 ../tools/split-valid-data.py -n 500 < $path 2> $file.train.txt | sort > $file.valid.txt

    test -e $file.train.src || cut -f1 $file.train.txt | python ../tools/tokenize.py > $file.train.src
    test -e $file.train.trg || cut -f2 $file.train.txt | python ../tools/tokenize.py > $file.train.trg
    test -e $file.valid.src.orig || cut -f1 $file.valid.txt > $file.valid.src.orig
    test -e $file.valid.trg.orig || cut -f2 $file.valid.txt > $file.valid.trg.orig
    test -e $file.valid.xml || python ../tools/wrapper_xml.py -c $file.valid.{src,trg}.orig > $file.valid.xml
    test -e $file.valid.src || python ../tools/tokenize.py < $file.valid.src.orig > $file.valid.src
    test -e $file.valid.trg || python ../tools/tokenize.py < $file.valid.trg.orig > $file.valid.trg
done
basename $path | cut -c12-15
Preparing ThEn trainset
basename $path | cut -c12-15
Preparing EnTh trainset
basename $path | cut -c12-15
Preparing PeEn trainset
basename $path | cut -c12-15
Preparing EnPe trainset
basename $path | cut -c12-15
Preparing ChEn trainset
basename $path | cut -c12-15
Preparing EnCh trainset
basename $path | cut -c12-15
Preparing EnVi trainset
basename $path | cut -c12-15
Preparing HeEn trainset
basename $path | cut -c12-15
Preparing EnBa trainset
basename $path | cut -c12-15
Preparing EnHi trainset
basename $path | cut -c12-15
Preparing EnKa trainset
basename $path | cut -c12-15
Preparing EnTa trainset
basename $path | cut -c12-15
Preparing EnHe trainset

for lang in Ch Th Pe He; do
    file1=data/En${lang}
    file2=data/${lang}En

    test -e $file1.dev.xml || continue
    test -e $file2.dev.xml || continue

    echo "Extra data for En${lang}/${lang}En"

    if [[ ! -e $file1.train.filter.src ]]; then
        cat $file1.dev.src $file1.valid.src | sort > $file1.filter.src
        cat $file2.dev.src $file2.valid.src | sort > $file2.filter.src

        paste $file2.train.trg $file2.train.src | python ../tools/filter-testset.py $file1.filter.src > $file1.train.extra.txt 2> $file1.train.src.filtered
        paste $file1.train.trg $file1.train.src | python ../tools/filter-testset.py $file2.filter.src > $file2.train.extra.txt 2> $file2.train.src.filtered

        cut -f1 $file1.train.extra.txt >> $file1.train.src
        cut -f2 $file1.train.extra.txt >> $file1.train.trg
        cut -f1 $file2.train.extra.txt >> $file2.train.src
        cut -f2 $file2.train.extra.txt >> $file2.train.trg
    fi
done
Extra data for EnCh/ChEn
Extra data for EnTh/ThEn
Extra data for EnPe/PeEn
Extra data for EnHe/HeEn

marian=../tools/marian-dev/build

for lang in `ls data/????.train.src | cut -c6-9`; do
    file=data/$lang

    test -e $lang.vocab.yml || cat $file.train.src $file.train.trg | $marian/marian-vocab > data/$lang.vocab.yml
done
ls data/????.train.src | cut -c6-9
[2019-08-14 13:32:02] Creating vocabulary...
[2019-08-14 13:32:02] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:04] Finished

Now this is the file sizes (I need only the EnHe file):

(base) jon@jon:~/PycharmProjects/news-translit-nmt-master/experiments/data$ ls -la
total 32608
drwxrwxr-x 2 jon jon   12288 אוג 14 13:51 .
drwxrwxr-x 4 jon jon    4096 אוג 14 13:51 ..
-rw-rw-r-- 1 jon jon   13000 אוג 14 13:51 EnHe.dev.src
-rw-rw-r-- 1 jon jon    7500 אוג 14 13:51 EnHe.dev.src.orig
-rw-rw-r-- 1 jon jon   20079 אוג 14 13:51 EnHe.dev.txt
-rw-rw-r-- 1 jon jon  111207 אוג 14 13:51 EnHe.dev.xml
-rw-rw-r-- 1 jon jon   19588 אוג 14 13:51 EnHe.filter.src
-rw-rw-r-- 1 jon jon  296484 אוג 14 13:51 EnHe.train.extra.txt
-rw-rw-r-- 1 jon jon  255673 אוג 14 13:51 EnHe.train.src
-rw-rw-r-- 1 jon jon       0 אוג 14 13:51 EnHe.train.src.filtered
-rw-rw-r-- 1 jon jon  341852 אוג 14 13:51 EnHe.train.trg
-rw-rw-r-- 1 jon jon  199030 אוג 14 13:51 EnHe.train.txt
-rw-rw-r-- 1 jon jon    6588 אוג 14 13:51 EnHe.valid.src
-rw-rw-r-- 1 jon jon    3794 אוג 14 13:51 EnHe.valid.src.orig
-rw-rw-r-- 1 jon jon    8785 אוג 14 13:51 EnHe.valid.trg
-rw-rw-r-- 1 jon jon    6346 אוג 14 13:51 EnHe.valid.trg.orig
-rw-rw-r-- 1 jon jon   10140 אוג 14 13:51 EnHe.valid.txt
-rw-rw-r-- 1 jon jon   50749 אוג 14 13:51 EnHe.valid.xml
-rw-rw-r-- 1 jon jon     431 אוג 14 13:51 EnHe.vocab.yml
-rw-rw-r-- 1 jon jon   17661 אוג 14 13:51 HeEn.dev.src
-rw-rw-r-- 1 jon jon   12752 אוג 14 13:51 HeEn.dev.src.orig
-rw-rw-r-- 1 jon jon   20332 אוג 14 13:51 HeEn.dev.txt
-rw-rw-r-- 1 jon jon  116403 אוג 14 13:51 HeEn.dev.xml
-rw-rw-r-- 1 jon jon   27186 אוג 14 13:51 HeEn.filter.src
-rw-rw-r-- 1 jon jon  301041 אוג 14 13:51 HeEn.train.extra.txt
-rw-rw-r-- 1 jon jon  341852 אוג 14 13:51 HeEn.train.src
-rw-rw-r-- 1 jon jon       0 אוג 14 13:51 HeEn.train.src.filtered
-rw-rw-r-- 1 jon jon  255673 אוג 14 13:51 HeEn.train.trg
-rw-rw-r-- 1 jon jon  195985 אוג 14 13:51 HeEn.train.txt
-rw-rw-r-- 1 jon jon    9525 אוג 14 13:51 HeEn.valid.src
-rw-rw-r-- 1 jon jon    6890 אוג 14 13:51 HeEn.valid.src.orig
-rw-rw-r-- 1 jon jon    7173 אוג 14 13:51 HeEn.valid.trg
-rw-rw-r-- 1 jon jon    4137 אוג 14 13:51 HeEn.valid.trg.orig
-rw-rw-r-- 1 jon jon   11027 אוג 14 13:51 HeEn.valid.txt
-rw-rw-r-- 1 jon jon   52743 אוג 14 13:51 HeEn.valid.xml
-rw-rw-r-- 1 jon jon     431 אוג 14 13:51 HeEn.vocab.yml

I'm not sure if those are the correct sizes. Now this is my next error:


(base) jon@jon:~/PycharmProjects/news-translit-nmt-master/experiments$ bash train.sh '0' EnHe
[2019-08-14 13:57:12] [marian] Marian v1.7.8 0f7b1e2 2019-08-14 12:11:53 +0200
[2019-08-14 13:57:12] [marian] Running on jon as process 29033 with command line:
[2019-08-14 13:57:12] [marian] ../tools/marian-dev/build/marian --devices 0 --model ./models/EnHe.1/model.npz --type s2s --train-sets ./data/EnHe.train.src ./data/EnHe.train.trg --vocabs ./models/EnHe.1/vocab.yml ./models/EnHe.1/vocab.yml --sqlite ./models/EnHe.1/corpus.sqlite3 --max-length 80 --mini-batch-fit -w 3000 --mini-batch 100 --maxi-batch 1000 --best-deep --dropout-rnn 0.2 --dropout-src 0.2 --dropout-trg 0.1 --tied-embeddings-all --layer-normalization --exponential-smoothing --learn-rate 0.0001 --lr-decay 0.8 --lr-decay-strategy stalled --lr-decay-start 1 --lr-report --valid-freq 500 --save-freq 2000 --disp-freq 100 --valid-metrics ce-mean-words translation --valid-translation-output ./models/EnHe.1/dev.out --quiet-translation --valid-sets ./data/EnHe.valid.src ./data/EnHe.valid.trg --valid-script-path ./models/EnHe.1/validate.sh --valid-mini-batch 64 --beam-size 10 --normalize 1.0 --early-stopping 10 --cost-type ce-mean-words --overwrite --keep-best --log ./models/EnHe.1/train.log --valid-log ./models/EnHe.1/valid.log
[2019-08-14 13:57:12] [config] after-batches: 0
[2019-08-14 13:57:12] [config] after-epochs: 0
[2019-08-14 13:57:12] [config] allow-unk: false
[2019-08-14 13:57:12] [config] beam-size: 10
[2019-08-14 13:57:12] [config] bert-class-symbol: "[CLS]"
[2019-08-14 13:57:12] [config] bert-mask-symbol: "[MASK]"
[2019-08-14 13:57:12] [config] bert-masking-fraction: 0.15
[2019-08-14 13:57:12] [config] bert-sep-symbol: "[SEP]"
[2019-08-14 13:57:12] [config] bert-train-type-embeddings: true
[2019-08-14 13:57:12] [config] bert-type-vocab-size: 2
[2019-08-14 13:57:12] [config] clip-gemm: 0
[2019-08-14 13:57:12] [config] clip-norm: 5
[2019-08-14 13:57:12] [config] cost-type: ce-mean-words
[2019-08-14 13:57:12] [config] cpu-threads: 0
[2019-08-14 13:57:12] [config] data-weighting: ""
[2019-08-14 13:57:12] [config] data-weighting-type: sentence
[2019-08-14 13:57:12] [config] dec-cell: gru
[2019-08-14 13:57:12] [config] dec-cell-base-depth: 4
[2019-08-14 13:57:12] [config] dec-cell-high-depth: 2
[2019-08-14 13:57:12] [config] dec-depth: 4
[2019-08-14 13:57:12] [config] devices:
[2019-08-14 13:57:12] [config]   - 0
[2019-08-14 13:57:12] [config] dim-emb: 512
[2019-08-14 13:57:12] [config] dim-rnn: 1024
[2019-08-14 13:57:12] [config] dim-vocabs:
[2019-08-14 13:57:12] [config]   - 0
[2019-08-14 13:57:12] [config]   - 0
[2019-08-14 13:57:12] [config] disp-first: 0
[2019-08-14 13:57:12] [config] disp-freq: 100
[2019-08-14 13:57:12] [config] disp-label-counts: false
[2019-08-14 13:57:12] [config] dropout-rnn: 0.2
[2019-08-14 13:57:12] [config] dropout-src: 0.2
[2019-08-14 13:57:12] [config] dropout-trg: 0.1
[2019-08-14 13:57:12] [config] dump-config: ""
[2019-08-14 13:57:12] [config] early-stopping: 10
[2019-08-14 13:57:12] [config] embedding-fix-src: false
[2019-08-14 13:57:12] [config] embedding-fix-trg: false
[2019-08-14 13:57:12] [config] embedding-normalization: false
[2019-08-14 13:57:12] [config] embedding-vectors:
[2019-08-14 13:57:12] [config]   []
[2019-08-14 13:57:12] [config] enc-cell: gru
[2019-08-14 13:57:12] [config] enc-cell-depth: 2
[2019-08-14 13:57:12] [config] enc-depth: 4
[2019-08-14 13:57:12] [config] enc-type: alternating
[2019-08-14 13:57:12] [config] exponential-smoothing: 0.0001
[2019-08-14 13:57:12] [config] grad-dropping-momentum: 0
[2019-08-14 13:57:12] [config] grad-dropping-rate: 0
[2019-08-14 13:57:12] [config] grad-dropping-warmup: 100
[2019-08-14 13:57:12] [config] guided-alignment: none
[2019-08-14 13:57:12] [config] guided-alignment-cost: mse
[2019-08-14 13:57:12] [config] guided-alignment-weight: 0.1
[2019-08-14 13:57:12] [config] ignore-model-config: false
[2019-08-14 13:57:12] [config] input-types:
[2019-08-14 13:57:12] [config]   []
[2019-08-14 13:57:12] [config] interpolate-env-vars: false
[2019-08-14 13:57:12] [config] keep-best: true
[2019-08-14 13:57:12] [config] label-smoothing: 0.1
[2019-08-14 13:57:12] [config] layer-normalization: true
[2019-08-14 13:57:12] [config] learn-rate: 0.0001
[2019-08-14 13:57:12] [config] log: ./models/EnHe.1/train.log
[2019-08-14 13:57:12] [config] log-level: info
[2019-08-14 13:57:12] [config] log-time-zone: ""
[2019-08-14 13:57:12] [config] lr-decay: 0.8
[2019-08-14 13:57:12] [config] lr-decay-freq: 50000
[2019-08-14 13:57:12] [config] lr-decay-inv-sqrt:
[2019-08-14 13:57:12] [config]   - 16000
[2019-08-14 13:57:12] [config] lr-decay-repeat-warmup: false
[2019-08-14 13:57:12] [config] lr-decay-reset-optimizer: false
[2019-08-14 13:57:12] [config] lr-decay-start:
[2019-08-14 13:57:12] [config]   - 1
[2019-08-14 13:57:12] [config] lr-decay-strategy: stalled
[2019-08-14 13:57:12] [config] lr-report: true
[2019-08-14 13:57:12] [config] lr-warmup: 0
[2019-08-14 13:57:12] [config] lr-warmup-at-reload: false
[2019-08-14 13:57:12] [config] lr-warmup-cycle: false
[2019-08-14 13:57:12] [config] lr-warmup-start-rate: 0
[2019-08-14 13:57:12] [config] max-length: 80
[2019-08-14 13:57:12] [config] max-length-crop: false
[2019-08-14 13:57:12] [config] max-length-factor: 3
[2019-08-14 13:57:12] [config] maxi-batch: 1000
[2019-08-14 13:57:12] [config] maxi-batch-sort: trg
[2019-08-14 13:57:12] [config] mini-batch: 100
[2019-08-14 13:57:12] [config] mini-batch-fit: true
[2019-08-14 13:57:12] [config] mini-batch-fit-step: 10
[2019-08-14 13:57:12] [config] mini-batch-overstuff: 1
[2019-08-14 13:57:12] [config] mini-batch-track-lr: false
[2019-08-14 13:57:12] [config] mini-batch-understuff: 1
[2019-08-14 13:57:12] [config] mini-batch-warmup: 0
[2019-08-14 13:57:12] [config] mini-batch-words: 0
[2019-08-14 13:57:12] [config] mini-batch-words-ref: 0
[2019-08-14 13:57:12] [config] model: ./models/EnHe.1/model.npz
[2019-08-14 13:57:12] [config] multi-loss-type: sum
[2019-08-14 13:57:12] [config] multi-node: false
[2019-08-14 13:57:12] [config] multi-node-overlap: true
[2019-08-14 13:57:12] [config] n-best: false
[2019-08-14 13:57:12] [config] no-nccl: false
[2019-08-14 13:57:12] [config] no-reload: false
[2019-08-14 13:57:12] [config] no-restore-corpus: false
[2019-08-14 13:57:12] [config] no-shuffle: false
[2019-08-14 13:57:12] [config] normalize: 1
[2019-08-14 13:57:12] [config] num-devices: 0
[2019-08-14 13:57:12] [config] optimizer: adam
[2019-08-14 13:57:12] [config] optimizer-delay: 1
[2019-08-14 13:57:12] [config] optimizer-params:
[2019-08-14 13:57:12] [config]   []
[2019-08-14 13:57:12] [config] overwrite: true
[2019-08-14 13:57:12] [config] pretrained-model: ""
[2019-08-14 13:57:12] [config] quiet: false
[2019-08-14 13:57:12] [config] quiet-translation: true
[2019-08-14 13:57:12] [config] relative-paths: false
[2019-08-14 13:57:12] [config] right-left: false
[2019-08-14 13:57:12] [config] save-freq: 2000
[2019-08-14 13:57:12] [config] seed: 0
[2019-08-14 13:57:12] [config] shuffle-in-ram: false
[2019-08-14 13:57:12] [config] skip: true
[2019-08-14 13:57:12] [config] sqlite: ./models/EnHe.1/corpus.sqlite3
[2019-08-14 13:57:12] [config] sqlite-drop: false
[2019-08-14 13:57:12] [config] sync-sgd: true
[2019-08-14 13:57:12] [config] tempdir: /tmp
[2019-08-14 13:57:12] [config] tied-embeddings: true
[2019-08-14 13:57:12] [config] tied-embeddings-all: true
[2019-08-14 13:57:12] [config] tied-embeddings-src: false
[2019-08-14 13:57:12] [config] train-sets:
[2019-08-14 13:57:12] [config]   - ./data/EnHe.train.src
[2019-08-14 13:57:12] [config]   - ./data/EnHe.train.trg
[2019-08-14 13:57:12] [config] transformer-aan-activation: swish
[2019-08-14 13:57:12] [config] transformer-aan-depth: 2
[2019-08-14 13:57:12] [config] transformer-aan-nogate: false
[2019-08-14 13:57:12] [config] transformer-decoder-autoreg: self-attention
[2019-08-14 13:57:12] [config] transformer-dim-aan: 2048
[2019-08-14 13:57:12] [config] transformer-dim-ffn: 2048
[2019-08-14 13:57:12] [config] transformer-dropout: 0
[2019-08-14 13:57:12] [config] transformer-dropout-attention: 0
[2019-08-14 13:57:12] [config] transformer-dropout-ffn: 0
[2019-08-14 13:57:12] [config] transformer-ffn-activation: swish
[2019-08-14 13:57:12] [config] transformer-ffn-depth: 2
[2019-08-14 13:57:12] [config] transformer-guided-alignment-layer: last
[2019-08-14 13:57:12] [config] transformer-heads: 8
[2019-08-14 13:57:12] [config] transformer-no-projection: false
[2019-08-14 13:57:12] [config] transformer-postprocess: dan
[2019-08-14 13:57:12] [config] transformer-postprocess-emb: d
[2019-08-14 13:57:12] [config] transformer-preprocess: ""
[2019-08-14 13:57:12] [config] transformer-tied-layers:
[2019-08-14 13:57:12] [config]   []
[2019-08-14 13:57:12] [config] transformer-train-position-embeddings: false
[2019-08-14 13:57:12] [config] type: s2s
[2019-08-14 13:57:12] [config] ulr: false
[2019-08-14 13:57:12] [config] ulr-dim-emb: 0
[2019-08-14 13:57:12] [config] ulr-dropout: 0
[2019-08-14 13:57:12] [config] ulr-keys-vectors: ""
[2019-08-14 13:57:12] [config] ulr-query-vectors: ""
[2019-08-14 13:57:12] [config] ulr-softmax-temperature: 1
[2019-08-14 13:57:12] [config] ulr-trainable-transformation: false
[2019-08-14 13:57:12] [config] valid-freq: 500
[2019-08-14 13:57:12] [config] valid-log: ./models/EnHe.1/valid.log
[2019-08-14 13:57:12] [config] valid-max-length: 1000
[2019-08-14 13:57:12] [config] valid-metrics:
[2019-08-14 13:57:12] [config]   - ce-mean-words
[2019-08-14 13:57:12] [config]   - translation
[2019-08-14 13:57:12] [config] valid-mini-batch: 64
[2019-08-14 13:57:12] [config] valid-script-path: ./models/EnHe.1/validate.sh
[2019-08-14 13:57:12] [config] valid-sets:
[2019-08-14 13:57:12] [config]   - ./data/EnHe.valid.src
[2019-08-14 13:57:12] [config]   - ./data/EnHe.valid.trg
[2019-08-14 13:57:12] [config] valid-translation-output: ./models/EnHe.1/dev.out
[2019-08-14 13:57:12] [config] vocabs:
[2019-08-14 13:57:12] [config]   - ./models/EnHe.1/vocab.yml
[2019-08-14 13:57:12] [config]   - ./models/EnHe.1/vocab.yml
[2019-08-14 13:57:12] [config] word-penalty: 0
[2019-08-14 13:57:12] [config] workspace: 3000
[2019-08-14 13:57:12] [config] Model is being created with Marian v1.7.8 0f7b1e2 2019-08-14 12:11:53 +0200
[2019-08-14 13:57:12] Using synchronous training
[2019-08-14 13:57:12] [data] Loading vocabulary from JSON/Yaml file ./models/EnHe.1/vocab.yml
[2019-08-14 13:57:12] [data] Setting vocabulary size for input 0 to 66
[2019-08-14 13:57:12] [data] Loading vocabulary from JSON/Yaml file ./models/EnHe.1/vocab.yml
[2019-08-14 13:57:12] [data] Setting vocabulary size for input 1 to 66
[2019-08-14 13:57:12] [sqlite] Reusing persistent database ./models/EnHe.1/corpus.sqlite3
[2019-08-14 13:57:12] Compiled without MPI support. Falling back to FakeMPIWrapper
[2019-08-14 13:57:12] [batching] Collecting statistics for batch fitting with step size 10
[2019-08-14 13:57:13] [memory] Extending reserved space to 3072 MB (device gpu0)
[2019-08-14 13:57:13] Error: CUDA error 2 'out of memory' - /home/jon/PycharmProjects/news-translit-nmt-master/tools/marian-dev/src/tensors/gpu/device.cu:38: cudaMalloc(&data_, size)
[2019-08-14 13:57:13] Error: Aborted from virtual void marian::gpu::Device::reserve(size_t) in /home/jon/PycharmProjects/news-translit-nmt-master/tools/marian-dev/src/tensors/gpu/device.cu:38

[CALL STACK]
[0x1b7a951]         marian::gpu::Device::  reserve  (unsigned long)    + 0x1401
[0x94ad2b]          marian::SyncGraphGroup::  SyncGraphGroup  (std::shared_ptr<marian::Options>,  std::shared_ptr<marian::IMPIWrapper>) + 0xdcb
[0x605850]          std::shared_ptr<marian::SyncGraphGroup> marian::  New  <marian::SyncGraphGroup,std::shared_ptr<marian::Options>&,std::shared_ptr<marian::IMPIWrapper>&>(std::shared_ptr<marian::Options>&,  std::shared_ptr<marian::IMPIWrapper>&) + 0x70
[0x67055c]          marian::Train<marian::SyncGraphGroup>::  run  ()   + 0x35c
[0x5a0cd9]          mainTrainer  (int,  char**)                        + 0x2c9
[0x57e77a]          main                                               + 0x8a
[0x7fcab3e1d830]    __libc_start_main                                  + 0xf0
[0x59e4f9]          _start                                             + 0x29

train-model.sh: line 49: 29033 Aborted                 (core dumped) $MARIAN/marian --devices $GPUS $OPTIONS --model $MODEL/model.npz --type s2s --train-sets $DATA/$LANGS.train.{src,trg} --vocabs $MODEL/vocab.yml $MODEL/vocab.yml --sqlite $MODEL/corpus.sqlite3 --max-length 80 --mini-batch-fit -w 3000 --mini-batch 100 --maxi-batch 1000 --best-deep --dropout-rnn 0.2 --dropout-src 0.2 --dropout-trg 0.1 --tied-embeddings-all --layer-normalization --exponential-smoothing --learn-rate 0.0001 --lr-decay 0.8 --lr-decay-strategy stalled --lr-decay-start 1 --lr-report --valid-freq 500 --save-freq 2000 --disp-freq 100 --valid-metrics ce-mean-words translation --valid-translation-output $MODEL/dev.out --quiet-translation --valid-sets $DATA/$LANGS.valid.{src,trg} --valid-script-path $MODEL/validate.sh --valid-mini-batch 64 --beam-size 10 --normalize 1.0 --early-stopping 10 --cost-type ce-mean-words --overwrite --keep-best --log $MODEL/train.log --valid-log $MODEL/valid.log

I see that I need stronger GPU

([memory] Extending reserved space to 3072 MB (device gpu0)
[2019-08-14 13:57:13] Error: CUDA error 2 'out of memory)

I'm very determined to run your code, so tell me if the data looks fine and the GPU memory error is the last problem, and I will find stronger computer to run it. Thanks

snukky commented 5 years ago

You don't have enough free memory on your GPU. Could you provide an output of nvidia-smi?

yonatanbitton commented 5 years ago

Sure!

(base) jon@jon:~/UMLS/UMLS-Similarity-1.47$ nvidia-smi
Wed Aug 14 15:58:49 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   41C    P5    N/A /  72W |    714MiB /  4033MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1075      G   /usr/lib/xorg/Xorg                           336MiB |
|    0      1535      G   compiz                                       231MiB |
|    0      2045      G   ...uest-channel-token=17727314116889282632   144MiB |
+-----------------------------------------------------------------------------+

This error is clear. I hope that this time the data was prepared well, and that the GPU memory is the final obstacle. I will try to follow this tutorial: https://medium.com/coinmonks/a-step-by-step-guide-to-set-up-an-aws-ec2-for-deep-learning-8f1b96bf7984 (Tell me if you have another suggestion) In order to get a stronger machine.

snukky commented 5 years ago

Take a look at your training data to check if they look OK. In train-model.sh, try replacing -w 3000 with -w 2000 (or even smaller) to decrease the required workspace. If possible, try to stop other processes to free that GPU memory and use slightly higher workspace.

yonatanbitton commented 5 years ago

Ok. I'm closing this issue, because the original problem was fixed, and I will open if the problem will consist in the VM. Thanks.