wanderine / BROCCOLI

BROCCOLI: Software for Fast fMRI Analysis on Many-Core CPUs and GPUs
GNU General Public License v3.0
113 stars 38 forks source link

Memory issue #42

Open cmehta126 opened 6 years ago

cmehta126 commented 6 years ago

I'm running Broccoli for permutation tests on MRI data. I'm getting an error of the type:

Run kernel error for kernel 'CalculateBetaWeightsGLM' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'CalculateStatisticalMapsGLMTTest' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'CalculateStatisticalMapsGLMTTestSecondLevelPermutation' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'TransformData' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'

It seems this is a memory issue. Are there anyways getting around this? The data I'm permuting are spatial maps of dimension 256x256x256 for several hundred samples. I'm using a mask generated from "Smoothing".

What is more is that, prior to this error, the output of RandomiseGroupLevel states

Permutation threshold for contrast 1 for a significance level of 0.050000 is 0.000000 Permutation threshold for contrast 2 for a significance level of 0.050000 is 0.000000 Permutation threshold for contrast 3 for a significance level of 0.050000 is 0.000000

Could this imply that the data is too highly correlated over voxels for "RandomiseGroupLevel" to work properly or is it most likely a memory issue? The volumes file is 2.5 GB in total with 356 subjects. My device information is:

Device info


Platform number: 0

Platform vendor: NVIDIA Corporation Platform name: NVIDIA CUDA Platform extentions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer Platform profile: FULL_PROFILE


Device number: 0

Device vendor: NVIDIA Corporation Device name: Tesla K80 Hardware version: OpenCL 1.2 CUDA Software version: 375.66 OpenCL C version: OpenCL C 1.2 Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer Global memory size in MB: 11439 Size of largest memory object in MB: 2859 Global memory cache size in KB: 208 Local memory size in KB: 48 Constant memory size in KB: 64 Parallel compute units: 13 Clock frequency in MHz: 823 Max number of threads per block: 1024 Max number of threads in each dimension: 1024 1024 64

It seems the hardware could theoretically handle loading 2.5 Gigs of data (but not sure if it is enough for Permutation tests).

thank you.

Best, Chintan

wanderine commented 6 years ago

Can you show the output of

GetOpenCLInfo

and the full output of your call to RandomiseGroupLevel ?

2017-11-13 21:21 GMT+01:00 cmehta126 notifications@github.com:

I'm running Broccoli for permutation tests on MRI data. I'm getting an error of the type:

Run kernel error for kernel 'CalculateBetaWeightsGLM' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'CalculateStatisticalMapsGLMTTest' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'CalculateStatisticalMapsGLMTTestSecondLevelPermutation' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'TransformData' is 'CL_MEM_OBJECTALLOCATION FAILURE'

It seems this is a memory issue. Are there anyways getting around this? The data I'm permuting are spatial maps of dimension 256x256x256 for several hundred samples. I'm using a mask generated from "Smoothing".

What is more is that, prior to this error, the output of RandomiseGroupLevel states

Permutation threshold for contrast 1 for a significance level of 0.050000 is 0.000000 Permutation threshold for contrast 2 for a significance level of 0.050000 is 0.000000 Permutation threshold for contrast 3 for a significance level of 0.050000 is 0.000000

Could this imply that the data is too highly correlated over voxels for "RandomiseGroupLevel" to work properly or is it most likely a memory issue?

thank you.

Best, Chintan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wanderine/BROCCOLI/issues/42, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGryE4M0tzK714xLjGjD_T80UmrKgpJks5s2KTUgaJpZM4QcWuo .

-- Anders Eklund, PhD

cmehta126 commented 6 years ago

RandomiseGroupLevel worked as I expected on my dataset after downsampling spatial maps in the input volume from 256x256x256 (1mm x 1mm x 1mm) to 128x128x128 (2mm x 2mm x 2mm). That significantly reduced the amount of memory needed for loading this data, without sacrificing specificity. The voxel resolution of the original data from diffusion weighted imaging (DWI) was on the order of (2mm x 2mm x 2mm) to begin with. I registered the DWI to FreeSurfer's CVS template which has voxel resolution (1mm x 1mm x 1mm) to enable group analysis, but I don't believe there is much loss of information by downsampling.

Regardless, here is the output from GetOpenCLInfo and RandomiseGroupLevel when using volumes of spatial maps with dimesnionality 256x256x256 (as I did originally and got error with)

GetOpenCLInfo:

Device info


Platform number: 0

Platform vendor: NVIDIA Corporation Platform name: NVIDIA CUDA Platform extentions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer Platform profile: FULL_PROFILE


Device number: 0

Device vendor: NVIDIA Corporation Device name: Tesla K80 Hardware version: OpenCL 1.2 CUDA Software version: 375.66 OpenCL C version: OpenCL C 1.2 Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer Global memory size in MB: 11439 Size of largest memory object in MB: 2859 Global memory cache size in KB: 208 Local memory size in KB: 48 Constant memory size in KB: 64 Parallel compute units: 13 Clock frequency in MHz: 823 Max number of threads per block: 1024 Max number of threads in each dimension: 1024 1024 64

The output of the call to RandomiseGroupLevel (this here is with 500 permutations. But same thing held when running 5000 permutations).

Authored by K.A. Eklund Data size: 256 x 256 x 256 x 362 Number of permutations: 500 Number of regressors: 8 Number of contrasts: 3 Performing 3 t-tests Correlation design detected for t-contrast 1 Correlation design detected for t-contrast 2 Correlation design detected for t-contrast 3 Max number of permutations for contrast 1 is inf Max number of permutations for contrast 2 is inf Max number of permutations for contrast 3 is inf Starting permutation 1 Starting permutation 101 Starting permutation 201 Starting permutation 301 Starting permutation 401 Permutation threshold for contrast 1 for a significance level of 0.050000 is 0.000000 Starting permutation 1 Starting permutation 101 Starting permutation 201 Starting permutation 301 Starting permutation 401 Permutation threshold for contrast 2 for a significance level of 0.050000 is 0.000000 Starting permutation 1 Starting permutation 101 Starting permutation 201 Starting permutation 301 Starting permutation 401 Permutation threshold for contrast 3 for a significance level of 0.050000 is 0.000000 Run kernel error for kernel 'CalculateBetaWeightsGLM' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'CalculateStatisticalMapsGLMTTest' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'CalculateStatisticalMapsGLMTTestSecondLevelPermutation' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'TransformData' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'

wanderine commented 6 years ago

362 volumes of size 256 x 256 x 256 requires about 22.6 GB of memory in float format, while your graphics card has 11 GB of memory.

2017-11-14 0:26 GMT+01:00 cmehta126 notifications@github.com:

RandomiseGroupLevel worked as I expected on my dataset after downsampling spatial maps in the input volume from 256x256x256 (1mm x 1mm x 1mm) to 128x128x128 (2mm x 2mm x 2mm). That significantly reduced the amount of memory needed for loading this data, without sacrificing specificity. The voxel resolution of the original data from diffusion weighted imaging (DWI) was on the order of (2mm x 2mm x 2mm) to begin with. I registered the DWI to FreeSurfer's CVS template which has voxel resolution (1mm x 1mm x 1mm) to enable group analysis, but I don't believe there is much loss of information by downsampling.

Regardless, here is the output from GetOpenCLInfo and RandomiseGroupLevel when using volumes of spatial maps with dimesnionality 256x256x256 (as I did originally and got error with)

GetOpenCLInfo:

Device info

Platform number: 0 Platform vendor: NVIDIA Corporation Platform name: NVIDIA CUDA Platform extentions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer Platform profile: FULL_PROFILE

Device number: 0

Device vendor: NVIDIA Corporation Device name: Tesla K80 Hardware version: OpenCL 1.2 CUDA Software version: 375.66 OpenCL C version: OpenCL C 1.2 Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer Global memory size in MB: 11439 Size of largest memory object in MB: 2859 Global memory cache size in KB: 208 Local memory size in KB: 48 Constant memory size in KB: 64 Parallel compute units: 13 Clock frequency in MHz: 823 Max number of threads per block: 1024 Max number of threads in each dimension: 1024 1024 64

The output of the call to RandomiseGroupLevel (this here is with 500 permutations. But same thing held when running 5000 permutations).

Authored by K.A. Eklund Data size: 256 x 256 x 256 x 362 Number of permutations: 500 Number of regressors: 8 Number of contrasts: 3 Performing 3 t-tests Correlation design detected for t-contrast 1 Correlation design detected for t-contrast 2 Correlation design detected for t-contrast 3 Max number of permutations for contrast 1 is inf Max number of permutations for contrast 2 is inf Max number of permutations for contrast 3 is inf Starting permutation 1 Starting permutation 101 Starting permutation 201 Starting permutation 301 Starting permutation 401 Permutation threshold for contrast 1 for a significance level of 0.050000 is 0.000000 Starting permutation 1 Starting permutation 101 Starting permutation 201 Starting permutation 301 Starting permutation 401 Permutation threshold for contrast 2 for a significance level of 0.050000 is 0.000000 Starting permutation 1 Starting permutation 101 Starting permutation 201 Starting permutation 301 Starting permutation 401 Permutation threshold for contrast 3 for a significance level of 0.050000 is 0.000000 Run kernel error for kernel 'CalculateBetaWeightsGLM' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'CalculateStatisticalMapsGLMTTest' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'CalculateStatisticalMapsGLMTTestSecondLevelPermutation' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE' Run kernel error for kernel 'TransformData' is 'CL_MEM_OBJECTALLOCATION FAILURE'

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wanderine/BROCCOLI/issues/42#issuecomment-344093659, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGryD0NQkDulCw5zSKLk0nuytV_1_PCks5s2NALgaJpZM4QcWuo .

-- Anders Eklund, PhD

cmehta126 commented 6 years ago

Thank you. Does Broccoli have a way of using additional memory to augment the RAM of a graphics card, given my graphics card is limited to 11GB of memory (with largest object size capped at 2.8 GB). I have available ~1 TB of Fast SSD memory mounted.

wanderine commented 6 years ago

No, but you can install an OpenCL driver for the CPU, and then run BROCCOLI in parallel on the CPU cores.

2017-11-14 17:23 GMT+01:00 cmehta126 notifications@github.com:

Thank you. Does Broccoli have a way of using additional memory to augment the RAM of a graphics card, given my graphics card is limited to 11GB of memory (with largest object size capped at 2.8 GB). I have available ~1 TB of Fast SSD memory mounted.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wanderine/BROCCOLI/issues/42#issuecomment-344312915, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGryKoYGScfPYI64WhSo2NXct3D_Oyqks5s2b5pgaJpZM4QcWuo .

-- Anders Eklund, PhD

cmehta126 commented 6 years ago

Thank you, that is very helpful.

On Nov 15, 2017 8:47 AM, "Anders Eklund" notifications@github.com wrote:

No, but you can install an OpenCL driver for the CPU, and then run BROCCOLI in parallel on the CPU cores.

2017-11-14 17:23 GMT+01:00 cmehta126 notifications@github.com:

Thank you. Does Broccoli have a way of using additional memory to augment the RAM of a graphics card, given my graphics card is limited to 11GB of memory (with largest object size capped at 2.8 GB). I have available ~1 TB of Fast SSD memory mounted.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/wanderine/BROCCOLI/issues/42#issuecomment-344312915 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AEGryKoYGScfPYI64WhSo2NXct3D_Oyqks5s2b5pgaJpZM4QcWuo .

-- Anders Eklund, PhD

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wanderine/BROCCOLI/issues/42#issuecomment-344596938, or mute the thread https://github.com/notifications/unsubscribe-auth/AE74rhAh1rN_bLHhmYaqlVQsiTydOuvrks5s2ut-gaJpZM4QcWuo .