microsoft / computervision-recipes

Best Practices, code samples, and documentation for Computer Vision.
MIT License
9.34k stars 1.16k forks source link

[BUG]During the process of the train, it occurs the problem of OOM #661

Open shanyun123456 opened 2 years ago

shanyun123456 commented 2 years ago

Description

Hi, @soumyadeepdey When I use your code to train a model in gpu, it seems always occer the problem of OOM. The batchsize is 1, other parameters haven't changed. Please check it,thanks

In which platform does it happen?

linux gpu

How do we replicate the issue?

You can use the command python3 sample_train.py, you can replicate the issue.

Expected behavior (i.e. solution)

Other Comment

截屏2021-10-14 下午2 18 37

s

截屏2021-10-14 下午2 19 49
soumyadeepdey commented 2 years ago

What is the memory size of the GPU you are using?

On Thu, Oct 14, 2021, 1:52 PM shanyun123456 @.***> wrote:

Description

Hi, @soumyadeepdey https://github.com/soumyadeepdey When I use your code to train a model in gpu, it seems always occer the problem of OOM. The batchsize is 1, other parameters haven't changed. Please check it,thanks In which platform does it happen? linux gpu How do we replicate the issue? You can use the command python3 sample_train.py, you can replicate the issue. Expected behavior (i.e. solution) Other Comment

[image: 截屏2021-10-14 下午2 18 37] https://user-images.githubusercontent.com/88327139/137262946-737e0f53-b5f6-44b7-832b-0952a573c60a.png s

[image: 截屏2021-10-14 下午2 19 49] https://user-images.githubusercontent.com/88327139/137262993-ca07462d-b265-4146-a7aa-65ecfea1972b.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/computervision-recipes/issues/661, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDWKP32VW32O5WQUHPUJY3UG2HLLANCNFSM5F7BOQPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

shanyun123456 commented 2 years ago

Hi, the memory size of my gpu is about 29g It's V100 gpu

soumyadeepdey commented 2 years ago

Hello, I also used the same gpu, but never faced OOM error.

However, you can add these two lines to your code to reduce the memory footprint.

os.environ["CUDA_VISIBLE_DEVICES"]='0' gpu_devices = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(gpu_devices[0], True)

Thanks Soumyadeep

On Sat, Oct 16, 2021 at 7:35 PM shanyun123456 @.***> wrote:

Hi, the memory size of my gpu is about 29g It's V100 gpu

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/computervision-recipes/issues/661#issuecomment-944920307, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDWKPZYHPPYA6F535DRKBLUHGBBNANCNFSM5F7BOQPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Thanks and Regards Soumyadeep Dey mob no : +919433715948

dongcin commented 2 years ago

I also meet the issue

soumyadeepdey commented 2 years ago

What is input image size and batch size you are using?

On Mon, Mar 28, 2022, 1:52 PM dongcin @.***> wrote:

I also meet the issue

— Reply to this email directly, view it on GitHub https://github.com/microsoft/computervision-recipes/issues/661#issuecomment-1080342317, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDWKPYENUDK6UWB4VLC6SDVCFT2VANCNFSM5F7BOQPA . You are receiving this because you were mentioned.Message ID: @.***>

Kdjhsa commented 1 month ago

I meet the issue too

soumyadeepdey commented 1 month ago

Did you try the previous solutions?

On Sun, May 5, 2024 at 12:36 PM Kdjhsa @.***> wrote:

I meet the issue too

— Reply to this email directly, view it on GitHub https://github.com/microsoft/computervision-recipes/issues/661#issuecomment-2094659254, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDWKPZZV3AA2TO5FMVXSP3ZAXK5VAVCNFSM5F7BOQPKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZGQ3DKOJSGU2A . You are receiving this because you were mentioned.Message ID: @.***>

--

Thanks and Regards Soumyadeep Dey mob no : +919433715948