marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.22k stars 228 forks source link

Marian decoder crashes on memory allocation #324

Closed CatarinaSilva closed 4 years ago

CatarinaSilva commented 4 years ago

After training a model, running marian-decoder crashes after a few log messages with large allocations of memory:

[2020-03-25 19:39:00] [config] Loaded model has been created with Marian v1.7.6 9fd5ba9 2019-11-27 19:28:16 -0800
[2020-03-25 19:39:00] [data] Loading vocabulary from JSON/Yaml file end-to-end/models/vocab.en.json
[2020-03-25 19:39:00] [data] Loading vocabulary from JSON/Yaml file end-to-end/models/vocab.pl.json
[2020-03-25 19:39:01] [memory] Extending reserved space to 512 MB (device gpu0)
[2020-03-25 19:39:01] Loading scorer of type transformer as feature F0
[2020-03-25 19:39:01] Loading model from model.npz.best-valid-script.npz
[2020-03-25 19:39:02] [memory] Reserving 237 MB, device gpu0
tcmalloc: large alloc 1073741824 bytes == 0x24120000 @
tcmalloc: large alloc 1207959552 bytes == 0x24120000 @
tcmalloc: large alloc 1476395008 bytes == 0x24120000 @
tcmalloc: large alloc 1610612736 bytes == 0x24120000 @
tcmalloc: large alloc 1744830464 bytes == 0x24120000 @
tcmalloc: large alloc 2013265920 bytes == 0x24120000 @
tcmalloc: large alloc 2281701376 bytes == 0x24120000 @
tcmalloc: large alloc 2550136832 bytes == 0x24120000 @
tcmalloc: large alloc 2818572288 bytes == 0x24120000 @
tcmalloc: large alloc 3221225472 bytes == 0x24120000 @
tcmalloc: large alloc 3489660928 bytes == 0x24120000 @
tcmalloc: large alloc 4026531840 bytes == 0x24120000 @
tcmalloc: large alloc 4429185024 bytes == 0x24120000 @
tcmalloc: large alloc 4966055936 bytes == 0x24120000 @
tcmalloc: large alloc 5637144576 bytes == 0x7f77c4000000 @
tcmalloc: large alloc 6308233216 bytes == 0x7f69b4000000 @
tcmalloc: large alloc 7113539584 bytes == 0x7f56c4000000 @

Is this intended or is it some bug? Why does it need so much memory for decoding?

GPU memory is 16GB and so is RAM.

emjotde commented 4 years ago

Hi, Do you have some very long input lines there?

CatarinaSilva commented 4 years ago

Maximum has 96 tokens (already broken into final subwords), it doesn't strike me as a super long line

emjotde commented 4 years ago

With a mini-batch size of 80 times 96 I could imagine something happening here, especially for a standard transformer where the decoder history will grow and grow. Maybe just reduce the batch size?

CatarinaSilva commented 4 years ago

Yes, that was my go-to quick fix, reducing the batch size, I was just wondering if there might be something else that I might be missing or some bug in this particular version, in particular because training ran well with the same batch-size, but maybe it didn't have such a long sentence (will check that also).

Thanks anyways :)