microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

RAM Usage explodes during Write Command #1324

Open Martlgap opened 7 years ago

Martlgap commented 7 years ago

Hey guys,

First of all some information about my Version and Network: ( I downloaded the latest CNTK binary about 2 Weeks ago. )

Build info:

    Built time: Dec 22 2016 01:45:54
    Last modified date: Thu Dec 22 01:35:08 2016
    Build type: Release
    Build target: GPU
    With 1bit-SGD: no
    With ASGD: yes
    Math lib: mkl
    CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
    CUB_PATH: c:\src\cub-1.4.1
    CUDNN_PATH: C:\local\cudnn-8.0-windows10-x64-v5.1
    Build Branch: HEAD
    Build SHA1: 8e8b5ff92eff4647be5d41a5a515956907567126
    Built by svcphil on Philly-Pool3
    Build Path: C:\Jenkins\workspace\CNTK-Build-Windows\Source\CNTK\


GPU info:

    Device[0]: cores = 2496; computeCapability = 5.2; type = "GeForce GTX 970"; memory = 4096 MB)



Network Structure:

Layer1: Convolution+Relu with 3x3x1x64 Filters Layer2-Layer19: Convolutions+Relus with 3x3x64x64 Filters Layer20: Convolution with 3x3x64x1 Filters


I use only Convolutions and ReLUs and I have Image as feature and Image as Label. I performed training with image input Size of 41x41x1 (Both Features and Labels). After that I do the ModelEdit() Function to change input Size to 512x512x1 (Both Features and Labels).

My MiniBatchSize for the write command is set to 1. I perform write with CPUOnly.

The write command takes more than 20GB of RAM. My Output-File-Size is approximately 2MB. (Only one Image stored)

The Amount of Parameters to store should be approx. 1Mio. ~ a few MB. Even if CNTK stores every FeatureMap inside the RAM that should take approx. 64192MB ~ 3GB.

Why is my RAM Usage going to be like 20GB or even higher?

Can anybody help?

eldakms commented 7 years ago

Hi Martlgap,

Could you please share your config? Is this brain script or python?

Martlgap commented 7 years ago

Hi eldakms,

I wrote a minimum working example, which shows my problem. It is attached to this comment.

I use the latest CNTK binary from february 10 2017.

When I run my code on my machine on device-> CPU i got the following cntk RAM usage:

Training: ~5GB RAM Edit: ~6GB RAM Write: ~6GB RAM but then->Exception Occurred: Bad Allocation

Architecture of Example Network: Input: 41x41x1-> Convolution with 21x21 Filters -> Featuremaps: 41x41x1024 -> Convolution with 21x21 Filters -> Output: 41x41x1

In total 904192 weights are learned.

I don´t understand, why CNTK needs so much RAM? I use "float" precision and that means in my opinion, that all numbers take 4Byte storage.

Input: 41x41x1 x4Byte -> 6.7kB FeatureMaps: 41x41x1024 x4Byte -> 6.9MB Output: 41x41x1 x4Byte -> 6.7kB Weights: 904192 x4Byte -> 3.6MB

Why does training even take GBs of RAM? Why does Model Edit take so much RAM? Why does Bad Allocation occur when performing write?

Could it be that CNTK stores the convolution weights for each pixel parallel? That would mean for the second convolution the filter: 21x21x1024 is copied 41x41 times. That would lead to a much higher memory size.

image2 image3 labels.txt MinWorkingExample.txt features.txt featuresbig.txt image1

cha-zhang commented 7 years ago

It has something to do with our CPU convolution implementation. It uses matrix unwrapping and your kernel size is unusually big. We are adding Intel MKL-DNN and that should resolve the issue.

cha-zhang commented 7 years ago

Hopefully this will be resolved by end of this iteration (ETA Nov. 3, 2017).