microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.49k stars 4.3k forks source link

Memory Utilization: AlexNet vs Inception-V3 in CNTK #3805

Open JohnCraigPublic opened 4 years ago

JohnCraigPublic commented 4 years ago

I have created networks (CNTK 2.7) to do the same computation in two ways: one with AlexNet, and one with Inception-V3. Both are modified only in the output stage so that they do regression instead of classification. When I run inference (C++, VS2017) on the same input data, I see very different memory utilization: 1) AlexNet grabs ~ 1 GB when I load the network into RAM. Thereafter utilization of RAM does not increase as I run inference multiple times. 2) Inception-V3 grabs ~ 1.5 GB when I load the network into RAM. On the first run of inference, it gobbles up an additional 5 GB of RAM. Thereafter on subsequent inferences the utilization of RAM does not increase.

My question is: is it known behavior that Inception-V3 grabs additional RAM the first time its run?

haixpham commented 4 years ago

AlexNet is small, thus little footprint. Inception-v3 is a much bigger network, which requires more memory for more hidden feature maps. That's normal behavior really.

JohnCraigPublic commented 4 years ago

I understand Inception is bigger -- but does it make sense that I see memory usage occur in two stages: 1) when model read into memory, and 2) when model first called in inference ??

Because AlexNet, regardless of size, did not display that behavior - it just occupied memory at load time, and no increase on first run.

delzac commented 4 years ago

My understanding is that CUDA will pick different optimised algorithm to compute conv ops. Depending on the size of the filter, number of filters, it will pick different algo that has different memory footprint.

So its not surprising that the memory increase after the first run for inception and not for alexnet.

JohnCraigPublic commented 4 years ago

OK thanks for comments. And I suppose, if this network was exported as ONNX, and then inferenced in ORT, we should see the same behviour?

delzac commented 4 years ago

I'm not sure. You can always test it and let us know :)

JohnCraigPublic commented 4 years ago

I'll try, but can't really seem to get anything converted from CNTK to run in ORT.

JohnCraigPublic commented 4 years ago

Follow-on to this - if anyone interested - I trained an Inception v3 in Tensorflow, then exported to ONNX, then ran in ORT on windows 10 - and I do not see this 'double gulp' of memory utilization - it just grabs its 120MB (or so) of RAM when the model is loaded, and does not grab more when it is first run.