Closed gregbugaj closed 1 year ago
Branch issue-27-Cuda_Error_Out_of_memory created!
Another instance is when the Overlay is being processes
I suspect this is issue with TorchVision
Creating overlay for : segment > /tmp/segment.png
dst_file_name : /tmp/form-segmentation/segment/dataroot_overlay/overlay_segment.png
opt.preprocess = none
dataset [SingleDataset] was created
__extract_segmentation_mask in 0.19 seconds
Segmented in 0.24 seconds
Traceback (most recent call last):
File "/home/greg/environment/marie/lib/python3.10/site-packages/gradio/routes.py", line 298, in run_predict
output = await app.blocks.process_api(
File "/home/greg/environment/marie/lib/python3.10/site-packages/gradio/blocks.py", line 790, in process_api
result = await self.call_function(fn_index, inputs, iterator)
File "/home/greg/environment/marie/lib/python3.10/site-packages/gradio/blocks.py", line 697, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/greg/environment/marie/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/greg/environment/marie/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/greg/environment/marie/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/greg/dev/marieai/marie-ai/workspaces/overlay-gradio/./app.py", line 16, in process_image
real, fake, blended = overlay_processor.segment(docId, src_img_path)
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/greg/dev/marieai/marie-ai/marie/overlay/overlay.py", line 248, in segment
fake_mask = self.__extract_segmentation_mask(
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/greg/dev/marieai/marie-ai/marie/overlay/overlay.py", line 129, in __extract_segmentation_mask
for i, data in enumerate(dataset):
File "/home/greg/dev/marieai/marie-ai/marie/models/pix2pix/data/__init__.py", line 101, in __iter__
for i, data in enumerate(self.dataloader):
File "/home/greg/environment/marie/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 635, in __next__
data = self._next_data()
File "/home/greg/environment/marie/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 679, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/greg/environment/marie/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/greg/environment/marie/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/greg/dev/marieai/marie-ai/marie/models/pix2pix/data/single_dataset.py", line 54, in __getitem__
A = self.transform(tensor_image.cuda())
File "/home/greg/environment/marie/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "/home/greg/environment/marie/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
def forward(self, input):
for module in self:
input = module(input)
~~~~~~ <--- HERE
return input
File "/home/greg/environment/marie/lib/python3.10/site-packages/torchvision/transforms/transforms.py", line 270, in forward
Tensor: Normalized Tensor image.
"""
return F.normalize(tensor, self.mean, self.std, self.inplace)
~~~~~~~~~~~ <--- HERE
File "/home/greg/environment/marie/lib/python3.10/site-packages/torchvision/transforms/functional.py", line 363, in normalize
raise TypeError(f"img should be Tensor Image. Got {type(tensor)}")
return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace)
~~~~~~~~~~~~~ <--- HERE
File "/home/greg/environment/marie/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py", line 911, in normalize
if not inplace:
tensor = tensor.clone()
~~~~~~~~~~~~ <--- HERE
dtype = tensor.dtype
RuntimeError: CUDA error: operation failed due to a previous error during capture
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
This needs to be handled better.