jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".
MIT License
8.83k stars 1.98k forks source link

Batch size limitation #24

Open rawmarshmellows opened 7 years ago

rawmarshmellows commented 7 years ago

Hi I was wondering why the maximum batch size is ~100 using a GPU with ~11GB of RAM whereas in the tensor2tensor the maximum batch size there is 1024?

jadore801120 commented 7 years ago

Hi @kevinlu1211,

Thanks for the reporting!

I have not run the tensor2tensor code. I am also curious about the memory usage of tensor2tensor with batch size equals 1024. Do you have the number? If the memory usage difference is large, there may exist a memory efficiency problem.

Thanks, Yu-Hsiang

rawmarshmellows commented 7 years ago

What do you mean by number?

On Tue, 15 Aug 2017 at 6:13 pm, Victor Huang notifications@github.com wrote:

Hi @kevinlu1211 https://github.com/kevinlu1211,

Thanks for the reporting!

I have not run the tensor2tensor code. I am also curious about the memory usage of tensor2tensor with batch size equals 1024. Do you have the number? If the memory usage difference is large, there may exist a memory efficiency problem.

Thanks, Yu-Hsiang

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jadore801120/attention-is-all-you-need-pytorch/issues/24#issuecomment-322406599, or mute the thread https://github.com/notifications/unsubscribe-auth/AME8D-idcHeCkfIfU_YIiYGb41sLypvSks5sYVMegaJpZM4O3LsR .

jadore801120 commented 7 years ago

Sorry for the ambiguity. I mean the memory usage of the tensor 2 tensor project.

2017年8月15日 下午4:40,"kevinlu1211" notifications@github.com寫道:

What do you mean by number?

On Tue, 15 Aug 2017 at 6:13 pm, Victor Huang notifications@github.com wrote:

Hi @kevinlu1211 https://github.com/kevinlu1211,

Thanks for the reporting!

I have not run the tensor2tensor code. I am also curious about the memory usage of tensor2tensor with batch size equals 1024. Do you have the number? If the memory usage difference is large, there may exist a memory efficiency problem.

Thanks, Yu-Hsiang

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jadore801120/attention-is-all- you-need-pytorch/issues/24#issuecomment-322406599, or mute the thread https://github.com/notifications/unsubscribe-auth/AME8D-idcHeCkfIfU_ YIiYGb41sLypvSks5sYVMegaJpZM4O3LsR .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jadore801120/attention-is-all-you-need-pytorch/issues/24#issuecomment-322413173, or mute the thread https://github.com/notifications/unsubscribe-auth/ADxwKjwL9YN8rKt_DIqhRAOiv64IwZi4ks5sYVmKgaJpZM4O3LsR .

rawmarshmellows commented 7 years ago

It uses around ~10GB RAM for a batch size of 1024

On Tue, Aug 15, 2017 at 6:43 PM, Victor Huang notifications@github.com wrote:

Sorry for the ambiguity. I mean the memory usage of the tensor 2 tensor project.

2017年8月15日 下午4:40,"kevinlu1211" notifications@github.com寫道:

What do you mean by number?

On Tue, 15 Aug 2017 at 6:13 pm, Victor Huang notifications@github.com wrote:

Hi @kevinlu1211 https://github.com/kevinlu1211,

Thanks for the reporting!

I have not run the tensor2tensor code. I am also curious about the memory usage of tensor2tensor with batch size equals 1024. Do you have the number? If the memory usage difference is large, there may exist a memory efficiency problem.

Thanks, Yu-Hsiang

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jadore801120/attention-is-all- you-need-pytorch/issues/24#issuecomment-322406599, or mute the thread https://github.com/notifications/unsubscribe-auth/AME8D-idcHeCkfIfU_ YIiYGb41sLypvSks5sYVMegaJpZM4O3LsR .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jadore801120/attention-is-all-you-need- pytorch/issues/24#issuecomment-322413173, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADxwKjwL9YN8rKt_DIqhRAOiv64IwZi4ks5sYVmKgaJpZM4O3LsR .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jadore801120/attention-is-all-you-need-pytorch/issues/24#issuecomment-322413750, or mute the thread https://github.com/notifications/unsubscribe-auth/AME8D4uH_tQRR5A4iZXl9yeOkH_HZpMSks5sYVoqgaJpZM4O3LsR .

shaform commented 5 years ago

It might be that the definition of batch_size is different for these two projects?

https://github.com/tensorflow/tensor2tensor/blob/a4f958a887f4f4466644dd0602bdd33985d61dd7/tensor2tensor/utils/data_reader.py#L86

    batch_size: int, total number of tokens in a batch.