Open sdlmw opened 4 years ago
Could you provide the command/config you use? More details would be helpful, e.g. what is the model you use or how large is it? What is your workspace? Do you train with mini-batch-fit?
Is this the only process running on the GPU?
Hi snukky,
Thank you for your support.
./build/marian --train-sets /Marian/1_ForTrain/TagRemoved_Source_Train_2560035.tok.en /Marian/1_ForTrain/TagRemoved_Target_Train_2560035.tok.de --vocabs /Marian/source_vocab.yml /Marian/target_vocab.yml --model /Marian/pre-train_model.npz --devices 0 --dim-emb 500 --after-epochs 13 --max-length 70 --max-length-crop
The training file has 2560035 lines。I used the GTX1080ti to train this engine.
Your vocabularies are huge, is that planned? Normally we would use something 10x smaller, this explains your model size due to the embeddings matrices:
[2019-12-06 00:18:08] [data] Setting vocabulary size for input 0 to 328116
[2019-12-06 00:18:09] [data] Setting vocabulary size for input 1 to 225581
HI emjotde, I build the vocabularies with the /build/marian-vocab command. Commas and periods don't do word breaks.
You need to tokenize your data first, I also recommend use subword-segmentation. Look at these examples:
HI emjotde,
As you said, I've already done tokenize. Via ./build/marian-vocab </Marian/1_ForTrain/TagRemoved_Source_Train_3318741.tok.en> /Marian/source_vocab.yml
command,The result still contains the symbol
Subword segmentation is your best bet here. See the provided examples for either BPE or SentencePiece. Also, your tokenizer doesn't seem to be particularly good if it kept those words together.
To complement Marcin's response, replacing
--vocabs /Marian/source_vocab.yml /Marian/target_vocab.yml
with
--vocabs /Marian/source_vocab.spm /Marian/target_vocab.spm
should solve the issue, but following the examples mentioned above will allow for better understanding of data pre-processing for NMT.
I personally fixed tcmalloc: large alloc ...
by updating Cuda.
Make sure to completely remove previous installations.
Installation intructions can be found here: https://askubuntu.com/questions/799184/how-can-i-install-cuda-on-ubuntu-16-04
Hm. The tcmalloc: large alloc ...
thing isn't really anything that needs to be fixed. It is just an unnecessary log message by Google's libtcmalloc whenever it allocates a larger (actually not that large) chunk of memory. It can be relatively safely ignored. It should also not go away from updating CUDA, these are rather unrelated.
In my case, it's not the log text that bothers. I had memory crash during the training. memory usage increases suddenly between epochs while I have enough GPU memory available.
I use a GTX 1080TI with 11GB memory. I allocate 1GB workspace and still crashes. Before crashing it shows me the tcmalloc: large alloc ...
The only fix that worked for me was updating the Cuda.
Hi! I've running into this problem. I've observed that decreasing --mini-batch
kind of mitigates the problem. But why does this problem happens? Why does the memory usage stop increasing? Does it marian apply some kind of cache or the problem is just related to fr-en model?
My command is:
/home/cgarcia/Documentos/experiment_crawling/marian/marian-dev/build/marian-decoder \
-c /home/cgarcia/Documentos/experiment_crawling/marian/students/fren/fren.student.tiny11/config.intgemm8bitalpha.yml \
--quiet --max-length-crop --cpu-threads 64 --mini-batch 8
UPDATE:
It seems that if --cpu-threads
is decreased, it kind of mitigates the problem too.