Closed yonatanbitton closed 5 years ago
Does your datasets
directory contain all the files as shown here? I have added error exit flag to the script, so please run it again and check when it stops.
Your EnHe.dev.src
is empty, which suggests an issue with ./tools/tokenize.py
.
Make sure that you delete the data
directory before running the script again.
I've deleted the project, cloned again, and performed prepare-data.sh
.
This is the output:
jon@jon:~/PycharmProjects/news-translit-nmt-master/experiments$ ./prepare-data.sh
#!/bin/bash -v
set -eo pipefail
dataset_dev=../datasets/NEWS2018_DATASET_0?/*_dev.xml
dataset_train=../datasets/NEWS2018_DATASET_0?/*_trn.xml
mkdir -p data
for path in $dataset_dev; do
lang=`basename $path | cut -c12-15`
file=data/$lang
echo "Preparing $lang devset"
test -e $file.dev.xml || cp $path $file.dev.xml
test -e $file.dev.txt || python3 ../tools/extract-xml.py --one-target < $file.dev.xml > $file.dev.txt
test -e $file.dev.src.orig || cut -f1 $file.dev.txt > $file.dev.src.orig
test -e $file.dev.src || cut -f1 $file.dev.txt | python ../tools/tokenize.py > $file.dev.src
done
basename $path | cut -c12-15
Preparing ThEn devset
basename $path | cut -c12-15
Preparing EnTh devset
basename $path | cut -c12-15
Preparing PeEn devset
basename $path | cut -c12-15
Preparing EnPe devset
basename $path | cut -c12-15
Preparing ChEn devset
basename $path | cut -c12-15
Preparing EnCh devset
basename $path | cut -c12-15
Preparing EnVi devset
basename $path | cut -c12-15
Preparing HeEn devset
basename $path | cut -c12-15
Preparing EnBa devset
basename $path | cut -c12-15
Preparing EnHi devset
basename $path | cut -c12-15
Preparing EnKa devset
basename $path | cut -c12-15
Preparing EnTa devset
basename $path | cut -c12-15
Preparing EnHe devset
for path in $dataset_train; do
lang=`basename $path | cut -c12-15`
file=data/$lang
echo "Preparing $lang trainset"
test -e $file.train.txt || python3 ../tools/split-valid-data.py -n 500 < $path 2> $file.train.txt | sort > $file.valid.txt
test -e $file.train.src || cut -f1 $file.train.txt | python ../tools/tokenize.py > $file.train.src
test -e $file.train.trg || cut -f2 $file.train.txt | python ../tools/tokenize.py > $file.train.trg
test -e $file.valid.src.orig || cut -f1 $file.valid.txt > $file.valid.src.orig
test -e $file.valid.trg.orig || cut -f2 $file.valid.txt > $file.valid.trg.orig
test -e $file.valid.xml || python ../tools/wrapper_xml.py -c $file.valid.{src,trg}.orig > $file.valid.xml
test -e $file.valid.src || python ../tools/tokenize.py < $file.valid.src.orig > $file.valid.src
test -e $file.valid.trg || python ../tools/tokenize.py < $file.valid.trg.orig > $file.valid.trg
done
basename $path | cut -c12-15
Preparing ThEn trainset
basename $path | cut -c12-15
Preparing EnTh trainset
basename $path | cut -c12-15
Preparing PeEn trainset
basename $path | cut -c12-15
Preparing EnPe trainset
basename $path | cut -c12-15
Preparing ChEn trainset
basename $path | cut -c12-15
Preparing EnCh trainset
basename $path | cut -c12-15
Preparing EnVi trainset
basename $path | cut -c12-15
Preparing HeEn trainset
basename $path | cut -c12-15
Preparing EnBa trainset
basename $path | cut -c12-15
Preparing EnHi trainset
basename $path | cut -c12-15
Preparing EnKa trainset
basename $path | cut -c12-15
Preparing EnTa trainset
basename $path | cut -c12-15
Preparing EnHe trainset
for lang in Ch Th Pe He; do
file1=data/En${lang}
file2=data/${lang}En
test -e $file1.dev.xml || continue
test -e $file2.dev.xml || continue
echo "Extra data for En${lang}/${lang}En"
if [[ ! -e $file1.train.filter.src ]]; then
cat $file1.dev.src $file1.valid.src | sort > $file1.filter.src
cat $file2.dev.src $file2.valid.src | sort > $file2.filter.src
paste $file2.train.trg $file2.train.src | python ../tools/filter-testset.py $file1.filter.src > $file1.train.extra.txt 2> $file1.train.src.filtered
paste $file1.train.trg $file1.train.src | python ../tools/filter-testset.py $file2.filter.src > $file2.train.extra.txt 2> $file2.train.src.filtered
cut -f1 $file1.train.extra.txt >> $file1.train.src
cut -f2 $file1.train.extra.txt >> $file1.train.trg
cut -f1 $file2.train.extra.txt >> $file2.train.src
cut -f2 $file2.train.extra.txt >> $file2.train.trg
fi
done
Extra data for EnCh/ChEn
Extra data for EnTh/ThEn
Extra data for EnPe/PeEn
Extra data for EnHe/HeEn
marian=../tools/marian-dev/build
for lang in `ls data/????.train.src | cut -c6-9`; do
file=data/$lang
test -e $lang.vocab.yml || cat $file.train.src $file.train.trg | $marian/marian-vocab > data/$lang.vocab.yml
done
ls data/????.train.src | cut -c6-9
[2019-08-14 13:32:02] Creating vocabulary...
[2019-08-14 13:32:02] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:03] Finished
[2019-08-14 13:32:03] Creating vocabulary...
[2019-08-14 13:32:03] [data] Creating vocabulary stdout from stdin
[2019-08-14 13:32:04] Finished
Now this is the file sizes (I need only the EnHe file):
(base) jon@jon:~/PycharmProjects/news-translit-nmt-master/experiments/data$ ls -la
total 32608
drwxrwxr-x 2 jon jon 12288 אוג 14 13:51 .
drwxrwxr-x 4 jon jon 4096 אוג 14 13:51 ..
-rw-rw-r-- 1 jon jon 13000 אוג 14 13:51 EnHe.dev.src
-rw-rw-r-- 1 jon jon 7500 אוג 14 13:51 EnHe.dev.src.orig
-rw-rw-r-- 1 jon jon 20079 אוג 14 13:51 EnHe.dev.txt
-rw-rw-r-- 1 jon jon 111207 אוג 14 13:51 EnHe.dev.xml
-rw-rw-r-- 1 jon jon 19588 אוג 14 13:51 EnHe.filter.src
-rw-rw-r-- 1 jon jon 296484 אוג 14 13:51 EnHe.train.extra.txt
-rw-rw-r-- 1 jon jon 255673 אוג 14 13:51 EnHe.train.src
-rw-rw-r-- 1 jon jon 0 אוג 14 13:51 EnHe.train.src.filtered
-rw-rw-r-- 1 jon jon 341852 אוג 14 13:51 EnHe.train.trg
-rw-rw-r-- 1 jon jon 199030 אוג 14 13:51 EnHe.train.txt
-rw-rw-r-- 1 jon jon 6588 אוג 14 13:51 EnHe.valid.src
-rw-rw-r-- 1 jon jon 3794 אוג 14 13:51 EnHe.valid.src.orig
-rw-rw-r-- 1 jon jon 8785 אוג 14 13:51 EnHe.valid.trg
-rw-rw-r-- 1 jon jon 6346 אוג 14 13:51 EnHe.valid.trg.orig
-rw-rw-r-- 1 jon jon 10140 אוג 14 13:51 EnHe.valid.txt
-rw-rw-r-- 1 jon jon 50749 אוג 14 13:51 EnHe.valid.xml
-rw-rw-r-- 1 jon jon 431 אוג 14 13:51 EnHe.vocab.yml
-rw-rw-r-- 1 jon jon 17661 אוג 14 13:51 HeEn.dev.src
-rw-rw-r-- 1 jon jon 12752 אוג 14 13:51 HeEn.dev.src.orig
-rw-rw-r-- 1 jon jon 20332 אוג 14 13:51 HeEn.dev.txt
-rw-rw-r-- 1 jon jon 116403 אוג 14 13:51 HeEn.dev.xml
-rw-rw-r-- 1 jon jon 27186 אוג 14 13:51 HeEn.filter.src
-rw-rw-r-- 1 jon jon 301041 אוג 14 13:51 HeEn.train.extra.txt
-rw-rw-r-- 1 jon jon 341852 אוג 14 13:51 HeEn.train.src
-rw-rw-r-- 1 jon jon 0 אוג 14 13:51 HeEn.train.src.filtered
-rw-rw-r-- 1 jon jon 255673 אוג 14 13:51 HeEn.train.trg
-rw-rw-r-- 1 jon jon 195985 אוג 14 13:51 HeEn.train.txt
-rw-rw-r-- 1 jon jon 9525 אוג 14 13:51 HeEn.valid.src
-rw-rw-r-- 1 jon jon 6890 אוג 14 13:51 HeEn.valid.src.orig
-rw-rw-r-- 1 jon jon 7173 אוג 14 13:51 HeEn.valid.trg
-rw-rw-r-- 1 jon jon 4137 אוג 14 13:51 HeEn.valid.trg.orig
-rw-rw-r-- 1 jon jon 11027 אוג 14 13:51 HeEn.valid.txt
-rw-rw-r-- 1 jon jon 52743 אוג 14 13:51 HeEn.valid.xml
-rw-rw-r-- 1 jon jon 431 אוג 14 13:51 HeEn.vocab.yml
I'm not sure if those are the correct sizes. Now this is my next error:
(base) jon@jon:~/PycharmProjects/news-translit-nmt-master/experiments$ bash train.sh '0' EnHe
[2019-08-14 13:57:12] [marian] Marian v1.7.8 0f7b1e2 2019-08-14 12:11:53 +0200
[2019-08-14 13:57:12] [marian] Running on jon as process 29033 with command line:
[2019-08-14 13:57:12] [marian] ../tools/marian-dev/build/marian --devices 0 --model ./models/EnHe.1/model.npz --type s2s --train-sets ./data/EnHe.train.src ./data/EnHe.train.trg --vocabs ./models/EnHe.1/vocab.yml ./models/EnHe.1/vocab.yml --sqlite ./models/EnHe.1/corpus.sqlite3 --max-length 80 --mini-batch-fit -w 3000 --mini-batch 100 --maxi-batch 1000 --best-deep --dropout-rnn 0.2 --dropout-src 0.2 --dropout-trg 0.1 --tied-embeddings-all --layer-normalization --exponential-smoothing --learn-rate 0.0001 --lr-decay 0.8 --lr-decay-strategy stalled --lr-decay-start 1 --lr-report --valid-freq 500 --save-freq 2000 --disp-freq 100 --valid-metrics ce-mean-words translation --valid-translation-output ./models/EnHe.1/dev.out --quiet-translation --valid-sets ./data/EnHe.valid.src ./data/EnHe.valid.trg --valid-script-path ./models/EnHe.1/validate.sh --valid-mini-batch 64 --beam-size 10 --normalize 1.0 --early-stopping 10 --cost-type ce-mean-words --overwrite --keep-best --log ./models/EnHe.1/train.log --valid-log ./models/EnHe.1/valid.log
[2019-08-14 13:57:12] [config] after-batches: 0
[2019-08-14 13:57:12] [config] after-epochs: 0
[2019-08-14 13:57:12] [config] allow-unk: false
[2019-08-14 13:57:12] [config] beam-size: 10
[2019-08-14 13:57:12] [config] bert-class-symbol: "[CLS]"
[2019-08-14 13:57:12] [config] bert-mask-symbol: "[MASK]"
[2019-08-14 13:57:12] [config] bert-masking-fraction: 0.15
[2019-08-14 13:57:12] [config] bert-sep-symbol: "[SEP]"
[2019-08-14 13:57:12] [config] bert-train-type-embeddings: true
[2019-08-14 13:57:12] [config] bert-type-vocab-size: 2
[2019-08-14 13:57:12] [config] clip-gemm: 0
[2019-08-14 13:57:12] [config] clip-norm: 5
[2019-08-14 13:57:12] [config] cost-type: ce-mean-words
[2019-08-14 13:57:12] [config] cpu-threads: 0
[2019-08-14 13:57:12] [config] data-weighting: ""
[2019-08-14 13:57:12] [config] data-weighting-type: sentence
[2019-08-14 13:57:12] [config] dec-cell: gru
[2019-08-14 13:57:12] [config] dec-cell-base-depth: 4
[2019-08-14 13:57:12] [config] dec-cell-high-depth: 2
[2019-08-14 13:57:12] [config] dec-depth: 4
[2019-08-14 13:57:12] [config] devices:
[2019-08-14 13:57:12] [config] - 0
[2019-08-14 13:57:12] [config] dim-emb: 512
[2019-08-14 13:57:12] [config] dim-rnn: 1024
[2019-08-14 13:57:12] [config] dim-vocabs:
[2019-08-14 13:57:12] [config] - 0
[2019-08-14 13:57:12] [config] - 0
[2019-08-14 13:57:12] [config] disp-first: 0
[2019-08-14 13:57:12] [config] disp-freq: 100
[2019-08-14 13:57:12] [config] disp-label-counts: false
[2019-08-14 13:57:12] [config] dropout-rnn: 0.2
[2019-08-14 13:57:12] [config] dropout-src: 0.2
[2019-08-14 13:57:12] [config] dropout-trg: 0.1
[2019-08-14 13:57:12] [config] dump-config: ""
[2019-08-14 13:57:12] [config] early-stopping: 10
[2019-08-14 13:57:12] [config] embedding-fix-src: false
[2019-08-14 13:57:12] [config] embedding-fix-trg: false
[2019-08-14 13:57:12] [config] embedding-normalization: false
[2019-08-14 13:57:12] [config] embedding-vectors:
[2019-08-14 13:57:12] [config] []
[2019-08-14 13:57:12] [config] enc-cell: gru
[2019-08-14 13:57:12] [config] enc-cell-depth: 2
[2019-08-14 13:57:12] [config] enc-depth: 4
[2019-08-14 13:57:12] [config] enc-type: alternating
[2019-08-14 13:57:12] [config] exponential-smoothing: 0.0001
[2019-08-14 13:57:12] [config] grad-dropping-momentum: 0
[2019-08-14 13:57:12] [config] grad-dropping-rate: 0
[2019-08-14 13:57:12] [config] grad-dropping-warmup: 100
[2019-08-14 13:57:12] [config] guided-alignment: none
[2019-08-14 13:57:12] [config] guided-alignment-cost: mse
[2019-08-14 13:57:12] [config] guided-alignment-weight: 0.1
[2019-08-14 13:57:12] [config] ignore-model-config: false
[2019-08-14 13:57:12] [config] input-types:
[2019-08-14 13:57:12] [config] []
[2019-08-14 13:57:12] [config] interpolate-env-vars: false
[2019-08-14 13:57:12] [config] keep-best: true
[2019-08-14 13:57:12] [config] label-smoothing: 0.1
[2019-08-14 13:57:12] [config] layer-normalization: true
[2019-08-14 13:57:12] [config] learn-rate: 0.0001
[2019-08-14 13:57:12] [config] log: ./models/EnHe.1/train.log
[2019-08-14 13:57:12] [config] log-level: info
[2019-08-14 13:57:12] [config] log-time-zone: ""
[2019-08-14 13:57:12] [config] lr-decay: 0.8
[2019-08-14 13:57:12] [config] lr-decay-freq: 50000
[2019-08-14 13:57:12] [config] lr-decay-inv-sqrt:
[2019-08-14 13:57:12] [config] - 16000
[2019-08-14 13:57:12] [config] lr-decay-repeat-warmup: false
[2019-08-14 13:57:12] [config] lr-decay-reset-optimizer: false
[2019-08-14 13:57:12] [config] lr-decay-start:
[2019-08-14 13:57:12] [config] - 1
[2019-08-14 13:57:12] [config] lr-decay-strategy: stalled
[2019-08-14 13:57:12] [config] lr-report: true
[2019-08-14 13:57:12] [config] lr-warmup: 0
[2019-08-14 13:57:12] [config] lr-warmup-at-reload: false
[2019-08-14 13:57:12] [config] lr-warmup-cycle: false
[2019-08-14 13:57:12] [config] lr-warmup-start-rate: 0
[2019-08-14 13:57:12] [config] max-length: 80
[2019-08-14 13:57:12] [config] max-length-crop: false
[2019-08-14 13:57:12] [config] max-length-factor: 3
[2019-08-14 13:57:12] [config] maxi-batch: 1000
[2019-08-14 13:57:12] [config] maxi-batch-sort: trg
[2019-08-14 13:57:12] [config] mini-batch: 100
[2019-08-14 13:57:12] [config] mini-batch-fit: true
[2019-08-14 13:57:12] [config] mini-batch-fit-step: 10
[2019-08-14 13:57:12] [config] mini-batch-overstuff: 1
[2019-08-14 13:57:12] [config] mini-batch-track-lr: false
[2019-08-14 13:57:12] [config] mini-batch-understuff: 1
[2019-08-14 13:57:12] [config] mini-batch-warmup: 0
[2019-08-14 13:57:12] [config] mini-batch-words: 0
[2019-08-14 13:57:12] [config] mini-batch-words-ref: 0
[2019-08-14 13:57:12] [config] model: ./models/EnHe.1/model.npz
[2019-08-14 13:57:12] [config] multi-loss-type: sum
[2019-08-14 13:57:12] [config] multi-node: false
[2019-08-14 13:57:12] [config] multi-node-overlap: true
[2019-08-14 13:57:12] [config] n-best: false
[2019-08-14 13:57:12] [config] no-nccl: false
[2019-08-14 13:57:12] [config] no-reload: false
[2019-08-14 13:57:12] [config] no-restore-corpus: false
[2019-08-14 13:57:12] [config] no-shuffle: false
[2019-08-14 13:57:12] [config] normalize: 1
[2019-08-14 13:57:12] [config] num-devices: 0
[2019-08-14 13:57:12] [config] optimizer: adam
[2019-08-14 13:57:12] [config] optimizer-delay: 1
[2019-08-14 13:57:12] [config] optimizer-params:
[2019-08-14 13:57:12] [config] []
[2019-08-14 13:57:12] [config] overwrite: true
[2019-08-14 13:57:12] [config] pretrained-model: ""
[2019-08-14 13:57:12] [config] quiet: false
[2019-08-14 13:57:12] [config] quiet-translation: true
[2019-08-14 13:57:12] [config] relative-paths: false
[2019-08-14 13:57:12] [config] right-left: false
[2019-08-14 13:57:12] [config] save-freq: 2000
[2019-08-14 13:57:12] [config] seed: 0
[2019-08-14 13:57:12] [config] shuffle-in-ram: false
[2019-08-14 13:57:12] [config] skip: true
[2019-08-14 13:57:12] [config] sqlite: ./models/EnHe.1/corpus.sqlite3
[2019-08-14 13:57:12] [config] sqlite-drop: false
[2019-08-14 13:57:12] [config] sync-sgd: true
[2019-08-14 13:57:12] [config] tempdir: /tmp
[2019-08-14 13:57:12] [config] tied-embeddings: true
[2019-08-14 13:57:12] [config] tied-embeddings-all: true
[2019-08-14 13:57:12] [config] tied-embeddings-src: false
[2019-08-14 13:57:12] [config] train-sets:
[2019-08-14 13:57:12] [config] - ./data/EnHe.train.src
[2019-08-14 13:57:12] [config] - ./data/EnHe.train.trg
[2019-08-14 13:57:12] [config] transformer-aan-activation: swish
[2019-08-14 13:57:12] [config] transformer-aan-depth: 2
[2019-08-14 13:57:12] [config] transformer-aan-nogate: false
[2019-08-14 13:57:12] [config] transformer-decoder-autoreg: self-attention
[2019-08-14 13:57:12] [config] transformer-dim-aan: 2048
[2019-08-14 13:57:12] [config] transformer-dim-ffn: 2048
[2019-08-14 13:57:12] [config] transformer-dropout: 0
[2019-08-14 13:57:12] [config] transformer-dropout-attention: 0
[2019-08-14 13:57:12] [config] transformer-dropout-ffn: 0
[2019-08-14 13:57:12] [config] transformer-ffn-activation: swish
[2019-08-14 13:57:12] [config] transformer-ffn-depth: 2
[2019-08-14 13:57:12] [config] transformer-guided-alignment-layer: last
[2019-08-14 13:57:12] [config] transformer-heads: 8
[2019-08-14 13:57:12] [config] transformer-no-projection: false
[2019-08-14 13:57:12] [config] transformer-postprocess: dan
[2019-08-14 13:57:12] [config] transformer-postprocess-emb: d
[2019-08-14 13:57:12] [config] transformer-preprocess: ""
[2019-08-14 13:57:12] [config] transformer-tied-layers:
[2019-08-14 13:57:12] [config] []
[2019-08-14 13:57:12] [config] transformer-train-position-embeddings: false
[2019-08-14 13:57:12] [config] type: s2s
[2019-08-14 13:57:12] [config] ulr: false
[2019-08-14 13:57:12] [config] ulr-dim-emb: 0
[2019-08-14 13:57:12] [config] ulr-dropout: 0
[2019-08-14 13:57:12] [config] ulr-keys-vectors: ""
[2019-08-14 13:57:12] [config] ulr-query-vectors: ""
[2019-08-14 13:57:12] [config] ulr-softmax-temperature: 1
[2019-08-14 13:57:12] [config] ulr-trainable-transformation: false
[2019-08-14 13:57:12] [config] valid-freq: 500
[2019-08-14 13:57:12] [config] valid-log: ./models/EnHe.1/valid.log
[2019-08-14 13:57:12] [config] valid-max-length: 1000
[2019-08-14 13:57:12] [config] valid-metrics:
[2019-08-14 13:57:12] [config] - ce-mean-words
[2019-08-14 13:57:12] [config] - translation
[2019-08-14 13:57:12] [config] valid-mini-batch: 64
[2019-08-14 13:57:12] [config] valid-script-path: ./models/EnHe.1/validate.sh
[2019-08-14 13:57:12] [config] valid-sets:
[2019-08-14 13:57:12] [config] - ./data/EnHe.valid.src
[2019-08-14 13:57:12] [config] - ./data/EnHe.valid.trg
[2019-08-14 13:57:12] [config] valid-translation-output: ./models/EnHe.1/dev.out
[2019-08-14 13:57:12] [config] vocabs:
[2019-08-14 13:57:12] [config] - ./models/EnHe.1/vocab.yml
[2019-08-14 13:57:12] [config] - ./models/EnHe.1/vocab.yml
[2019-08-14 13:57:12] [config] word-penalty: 0
[2019-08-14 13:57:12] [config] workspace: 3000
[2019-08-14 13:57:12] [config] Model is being created with Marian v1.7.8 0f7b1e2 2019-08-14 12:11:53 +0200
[2019-08-14 13:57:12] Using synchronous training
[2019-08-14 13:57:12] [data] Loading vocabulary from JSON/Yaml file ./models/EnHe.1/vocab.yml
[2019-08-14 13:57:12] [data] Setting vocabulary size for input 0 to 66
[2019-08-14 13:57:12] [data] Loading vocabulary from JSON/Yaml file ./models/EnHe.1/vocab.yml
[2019-08-14 13:57:12] [data] Setting vocabulary size for input 1 to 66
[2019-08-14 13:57:12] [sqlite] Reusing persistent database ./models/EnHe.1/corpus.sqlite3
[2019-08-14 13:57:12] Compiled without MPI support. Falling back to FakeMPIWrapper
[2019-08-14 13:57:12] [batching] Collecting statistics for batch fitting with step size 10
[2019-08-14 13:57:13] [memory] Extending reserved space to 3072 MB (device gpu0)
[2019-08-14 13:57:13] Error: CUDA error 2 'out of memory' - /home/jon/PycharmProjects/news-translit-nmt-master/tools/marian-dev/src/tensors/gpu/device.cu:38: cudaMalloc(&data_, size)
[2019-08-14 13:57:13] Error: Aborted from virtual void marian::gpu::Device::reserve(size_t) in /home/jon/PycharmProjects/news-translit-nmt-master/tools/marian-dev/src/tensors/gpu/device.cu:38
[CALL STACK]
[0x1b7a951] marian::gpu::Device:: reserve (unsigned long) + 0x1401
[0x94ad2b] marian::SyncGraphGroup:: SyncGraphGroup (std::shared_ptr<marian::Options>, std::shared_ptr<marian::IMPIWrapper>) + 0xdcb
[0x605850] std::shared_ptr<marian::SyncGraphGroup> marian:: New <marian::SyncGraphGroup,std::shared_ptr<marian::Options>&,std::shared_ptr<marian::IMPIWrapper>&>(std::shared_ptr<marian::Options>&, std::shared_ptr<marian::IMPIWrapper>&) + 0x70
[0x67055c] marian::Train<marian::SyncGraphGroup>:: run () + 0x35c
[0x5a0cd9] mainTrainer (int, char**) + 0x2c9
[0x57e77a] main + 0x8a
[0x7fcab3e1d830] __libc_start_main + 0xf0
[0x59e4f9] _start + 0x29
train-model.sh: line 49: 29033 Aborted (core dumped) $MARIAN/marian --devices $GPUS $OPTIONS --model $MODEL/model.npz --type s2s --train-sets $DATA/$LANGS.train.{src,trg} --vocabs $MODEL/vocab.yml $MODEL/vocab.yml --sqlite $MODEL/corpus.sqlite3 --max-length 80 --mini-batch-fit -w 3000 --mini-batch 100 --maxi-batch 1000 --best-deep --dropout-rnn 0.2 --dropout-src 0.2 --dropout-trg 0.1 --tied-embeddings-all --layer-normalization --exponential-smoothing --learn-rate 0.0001 --lr-decay 0.8 --lr-decay-strategy stalled --lr-decay-start 1 --lr-report --valid-freq 500 --save-freq 2000 --disp-freq 100 --valid-metrics ce-mean-words translation --valid-translation-output $MODEL/dev.out --quiet-translation --valid-sets $DATA/$LANGS.valid.{src,trg} --valid-script-path $MODEL/validate.sh --valid-mini-batch 64 --beam-size 10 --normalize 1.0 --early-stopping 10 --cost-type ce-mean-words --overwrite --keep-best --log $MODEL/train.log --valid-log $MODEL/valid.log
I see that I need stronger GPU
([memory] Extending reserved space to 3072 MB (device gpu0)
[2019-08-14 13:57:13] Error: CUDA error 2 'out of memory)
I'm very determined to run your code, so tell me if the data looks fine and the GPU memory error is the last problem, and I will find stronger computer to run it. Thanks
You don't have enough free memory on your GPU. Could you provide an output of nvidia-smi
?
Sure!
(base) jon@jon:~/UMLS/UMLS-Similarity-1.47$ nvidia-smi
Wed Aug 14 15:58:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:01:00.0 On | N/A |
| 0% 41C P5 N/A / 72W | 714MiB / 4033MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1075 G /usr/lib/xorg/Xorg 336MiB |
| 0 1535 G compiz 231MiB |
| 0 2045 G ...uest-channel-token=17727314116889282632 144MiB |
+-----------------------------------------------------------------------------+
This error is clear. I hope that this time the data was prepared well, and that the GPU memory is the final obstacle. I will try to follow this tutorial: https://medium.com/coinmonks/a-step-by-step-guide-to-set-up-an-aws-ec2-for-deep-learning-8f1b96bf7984 (Tell me if you have another suggestion) In order to get a stronger machine.
Take a look at your training data to check if they look OK. In train-model.sh
, try replacing -w 3000
with -w 2000
(or even smaller) to decrease the required workspace. If possible, try to stop other processes to free that GPU memory and use slightly higher workspace.
Ok. I'm closing this issue, because the original problem was fixed, and I will open if the problem will consist in the VM. Thanks.
Hello. I've successfully prepared the data:
And now I want to run the training of English to Hebrew. So I execute:
bash train.sh '0 1' EnHe
And getting the following error:
And indeed when I look on the train.src files, some are not emplty (EnCh), and some are empty (EnHe).
How can I solve it?
I really want to be able to run your project, I want to cite you in my article :)
Thanks