marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.25k stars 233 forks source link

tcmalloc: large alloc 2818572288 bytes == 0x33daa000 @ #300

Open sdlmw opened 4 years ago

sdlmw commented 4 years ago

[2019-12-06 00:18:07] [data] Loading vocabulary from JSON/Yaml file /191206/source_vocab.yml
[2019-12-06 00:18:08] [data] Setting vocabulary size for input 0 to 328116
[2019-12-06 00:18:08] [data] Loading vocabulary from JSON/Yaml file /191206/target_vocab.yml
[2019-12-06 00:18:09] [data] Setting vocabulary size for input 1 to 225581
[2019-12-06 00:18:10] [memory] Extending reserved space to 2048 MB (device gpu0)
[2019-12-06 00:18:10] Training started
[2019-12-06 00:18:10] [data] Shuffling files
[2019-12-06 00:18:10] [data] Done reading 1754741 sentences
[2019-12-06 00:18:14] [data] Done shuffling 1754741 sentences to temp files
[2019-12-06 00:18:14] [memory] Reserving 1615 MB, device gpu0
[2019-12-06 00:18:15] [memory] Reserving 1615 MB, device gpu0
tcmalloc: large alloc 2147483648 bytes == 0x33daa000 @
tcmalloc: large alloc 2281701376 bytes == 0x33daa000 @
tcmalloc: large alloc 2415919104 bytes == 0x33daa000 @
tcmalloc: large alloc 2550136832 bytes == 0x33daa000 @
tcmalloc: large alloc 2684354560 bytes == 0x33daa000 @
tcmalloc: large alloc 2818572288 bytes == 0x33daa000 @
tcmalloc: large alloc 2952790016 bytes == 0x33daa000 @
tcmalloc: large alloc 3087007744 bytes == 0x33daa000 @
tcmalloc: large alloc 3221225472 bytes == 0x33daa000 @
tcmalloc: large alloc 3355443200 bytes == 0x33daa000 @
tcmalloc: large alloc 3489660928 bytes == 0x33daa000 @
[2019-12-06 00:18:34] [memory] Reserving 3231 MB, device gpu0
tcmalloc: large alloc 4026531840 bytes == 0x33daa000 @
tcmalloc: large alloc 4429185024 bytes == 0x33daa000 @
[2019-12-06 00:18:54] Error: CUDA error 2 'out of memory' - /marian/src/tensors/gpu/device.cu:32: cudaMalloc(&data_, size)
[2019-12-06 00:18:54] Error: Aborted from virtual void marian::gpu::Device::reserve(size_t) in /marian/src/tensors/gpu/device.cu:32

[CALL STACK]
[0xb70bb7]
[0x5d028c]
[0x66a074]```

This is a France demo I trained. Memory increases significantly during training, resulting in out-of-memory.But the same number of Germans is not expected to have this problem。What is the cause of this problem? 
thanks
snukky commented 4 years ago

Could you provide the command/config you use? More details would be helpful, e.g. what is the model you use or how large is it? What is your workspace? Do you train with mini-batch-fit?

Is this the only process running on the GPU?

sdlmw commented 4 years ago

Hi snukky,

Thank you for your support.
./build/marian --train-sets /Marian/1_ForTrain/TagRemoved_Source_Train_2560035.tok.en /Marian/1_ForTrain/TagRemoved_Target_Train_2560035.tok.de --vocabs /Marian/source_vocab.yml /Marian/target_vocab.yml --model /Marian/pre-train_model.npz --devices 0 --dim-emb 500 --after-epochs 13 --max-length 70 --max-length-crop

The training file has 2560035 lines。I used the GTX1080ti to train this engine.

emjotde commented 4 years ago

Your vocabularies are huge, is that planned? Normally we would use something 10x smaller, this explains your model size due to the embeddings matrices:

[2019-12-06 00:18:08] [data] Setting vocabulary size for input 0 to 328116
[2019-12-06 00:18:09] [data] Setting vocabulary size for input 1 to 225581
sdlmw commented 4 years ago

HI emjotde, I build the vocabularies with the /build/marian-vocab command. Commas and periods don't do word breaks. image image

emjotde commented 4 years ago

You need to tokenize your data first, I also recommend use subword-segmentation. Look at these examples:

sdlmw commented 4 years ago

HI emjotde,

As you said, I've already done tokenize. Via ./build/marian-vocab </Marian/1_ForTrain/TagRemoved_Source_Train_3318741.tok.en> /Marian/source_vocab.yml command,The result still contains the symbol

emjotde commented 4 years ago

Subword segmentation is your best bet here. See the provided examples for either BPE or SentencePiece. Also, your tokenizer doesn't seem to be particularly good if it kept those words together.

snukky commented 4 years ago

To complement Marcin's response, replacing

--vocabs /Marian/source_vocab.yml /Marian/target_vocab.yml

with

--vocabs /Marian/source_vocab.spm /Marian/target_vocab.spm

should solve the issue, but following the examples mentioned above will allow for better understanding of data pre-processing for NMT.

adjouama commented 4 years ago

I personally fixed tcmalloc: large alloc ... by updating Cuda. Make sure to completely remove previous installations.

Installation intructions can be found here: https://askubuntu.com/questions/799184/how-can-i-install-cuda-on-ubuntu-16-04

emjotde commented 4 years ago

Hm. The tcmalloc: large alloc ... thing isn't really anything that needs to be fixed. It is just an unnecessary log message by Google's libtcmalloc whenever it allocates a larger (actually not that large) chunk of memory. It can be relatively safely ignored. It should also not go away from updating CUDA, these are rather unrelated.

adjouama commented 4 years ago

In my case, it's not the log text that bothers. I had memory crash during the training. memory usage increases suddenly between epochs while I have enough GPU memory available.

I use a GTX 1080TI with 11GB memory. I allocate 1GB workspace and still crashes. Before crashing it shows me the tcmalloc: large alloc ...

The only fix that worked for me was updating the Cuda.

cgr71ii commented 2 years ago

Hi! I've running into this problem. I've observed that decreasing --mini-batch kind of mitigates the problem. But why does this problem happens? Why does the memory usage stop increasing? Does it marian apply some kind of cache or the problem is just related to fr-en model?

My command is:

/home/cgarcia/Documentos/experiment_crawling/marian/marian-dev/build/marian-decoder \
  -c /home/cgarcia/Documentos/experiment_crawling/marian/students/fren/fren.student.tiny11/config.intgemm8bitalpha.yml \
  --quiet --max-length-crop --cpu-threads 64 --mini-batch 8

UPDATE: It seems that if --cpu-threads is decreased, it kind of mitigates the problem too.