Efficientnet-b0 Torch model fails on CUDA trying to allocate buffer.

monorimet commented 1 year ago

The following error occurs running the pytorch-sourced efficientnet-b0 model on CUDA backend

E     RuntimeError: Error registering modules: c/runtime/src/iree/hal/drivers/cuda/native_executable.c:133: INTERNAL; CUDA driver error: Requested shared memory size of 200704 larger than allowed size of 166912; while invoking native function hal.executable.create; while calling import;

Stack Trace:

tank/test_models.py:379: in test_module                                                                                                                          
    self.module_tester.create_and_check_module(dynamic, device)                                                                                                  
tank/test_models.py:192: in create_and_check_module                                                                                                              
    shark_module.compile()                                                                                                                                       
shark/shark_inference.py:119: in compile                                                                                                                         
    self.shark_runner = SharkRunner(                                                                                                                             
shark/shark_runner.py:84: in __init__                                                                                                                            
    ) = get_iree_compiled_module(                                                                                                                                
shark/iree_utils/compile_utils.py:325: in get_iree_compiled_module                                                                                               
    return get_iree_module(flatbuffer_blob, device, device_idx=device_idx)                                                                                       
shark/iree_utils/compile_utils.py:308: in get_iree_module                                                                                                        
    ctx.add_vm_module(vm_module)                                                                                                                                 
shark.venv/lib/python3.11/site-packages/iree/runtime/system_api.py:255: in add_vm_module                                                                         
    self.add_vm_modules((vm_module,))                                                                                                                            
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _                                                  

self = <iree.runtime.system_api.SystemContext object at 0x7fbec36f7ed0>                                                                                          
vm_modules = (<VmModule module : [forward, __init]>,)                                                                                                            

    def add_vm_modules(self, vm_modules):                                                                                                                        
      assert self._is_dynamic, "Cannot 'add_module' on a static context"                                                                                         
      for m in vm_modules:                                                                                                                                       
        if m.name in self._bound_modules:                                                                                                                        
          raise ValueError(f"Attempt to register duplicate VmModule: '{m.name}'")                                                                                
        bound_module = BoundModule(self, m)                                                                                                                      
        self._bound_modules[m.name] = bound_module                              
        if self._tracer:                                                        
          self._tracer.add_module(bound_module.traced_module)                                                                                                    
>     self._vm_context.register_modules(vm_modules)                                                                                                              
E     RuntimeError: Error registering modules: c/runtime/src/iree/hal/drivers/cuda/native_executable.c:133: INTERNAL; CUDA driver error: Requested shared memory size of 200704 larger than allowed size of 166912; while invoking native function hal.executable.create; while calling import;                                    
E     [ 1]   native hal.executable.create:0 -                                                                                                                    
E     [ 0] bytecode module.__init:1928 /home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py:459:0                                    

shark.venv/lib/python3.11/site-packages/iree/runtime/system_api.py:252: RuntimeError    
-------------------------------------------- Captured stdout call ---------------------------------------------
1
Using cached models from /home/ean/SHARK/shark/../gen_shark_tank/...
Updating artifacts for model efficientnet_b0...
Found 1 device(s).
Device: 0
  Name: NVIDIA A100-SXM4-40GB
  Compute Capability: 8.0
=========================================== short test summary info ===========================================
FAILED tank/test_models.py::SharkModuleTest::test_module_efficientnet_b0_torch_dynamic_cuda - RuntimeError: Error registering modules: c/runtime/src/iree/hal/drivers/cuda/native_executable.c:133: INTE...
FAILED tank/test_models.py::SharkModuleTest::test_module_efficientnet_b0_torch_static_cuda - RuntimeError: Error registering modules: c/runtime/src/iree/hal/drivers/cuda/native_executable.c:133: INTE...

fyi @mariecwhite

mariecwhite commented 1 year ago

Is this for batch size 1 or a larger batch size?

monorimet commented 1 year ago

Is this for batch size 1 or a larger batch size?

Batch size 1.

mariecwhite commented 1 year ago

This is new - it was definitely working ~3 weeks ago. Do you mind filing an issue to open-xla/iree?

nod-ai / SHARK-Studio

Efficientnet-b0 Torch model fails on CUDA trying to allocate buffer. #1243