Out of memory even for small model

I am trying to use nmt for extraction of structured text, but have run into an unexpected out-of-GPU-memory problem. I'm using only the default parameters (two layers with 32 units, no attention) and a limited vocabulary of size 9998 (including and sentence delimiters). The embedding matrixes are shared for source and target. During training and evaluation the model is nice and small and uses only a few hundred MB of the 4GB GPU, but during testing with the inference model it explodes and it seems to allocate more than 2500 copies of the 32x9998 embedding matrix, and naturally runs out of memory. The offending part of the graph seems to be created in a "control_flow_ops.while_loop" in the dynamic decoder. Have I used the model in a sub-optimal way, or is this some strange bug?

tensorflow / nmt

Out of memory even for small model #278