Open ShichenLiu opened 5 years ago
I think it is because that the image size of rgb image is (1200, 1600) during evaluation . However, the image size in training procedure is (512, 640) which cause out of memory during training.
Hope this answer will help you!
@ShichenLiu Hi, have you solve the problem?
RuntimeError: CUDA out of memory. Tried to allocate 2.29 GiB (GPU 0; 10.73 GiB total capacity; 7.34 GiB already allocated; 1.63 GiB free; 997.85 MiB cached)
@tiffany61706 Hi But if i use the image(512,640), it comes errors as “AssertionError:assert np_img.shape[:2] == (1200, 1600) in /MVSNet_pytorch/datasets/dtu_yao_eval.py", line 63, in read_img”
And after i change the code to assert np_img.shape[:2] == (512, 640), it comes the error as "RuntimeError:The size of tensor a (31) must match the size of tensor b (32) at non-singleton dimension 3 in /MVSNet_pytorch/models/mvsnet.py".
So I think it is not a good idea to changed the size of input image.
@whubaichuan hi If you want to use (1200,1600) as input without causing OOM, I suggest that you can add a few lines in mvsnet.py such as below. I think OOM won't happen again.
@tiffany61706
Thanks for your reply. I just try your methods but it failed again.
RuntimeError: CUDA out of memory. Tried to allocate 2.29 GiB (GPU 0; 10.73 GiB total capacity; 7.34 GiB already allocated; 1.63 GiB free; 997.85 MiB cached)
@whubaichuan hi Can you paste all the terminal message?
@tiffany61706 hi, here is all the terminal message.
Traceback (most recent call last):
File "eval.py", line 302, in
@whubaichuan sorry, I don't have wechat. I think all you need is paste a few lines in mvsnet.py . If there still having OOM problem please delete already finish array to release the memory.
Did anyone solve this issue? I have the same problem. I have a 11gb GPU is this not enough?
@soulslicer delete some variable to release the memory
Original size 1600x1184
cause OOM on my 11GB GPU. I resized the image to 1152x864
and it works (costs 6831MB). Don't forget to change the instrinsics as the following:
def read_cam_file(self, filename):
with open(filename) as f:
lines = f.readlines()
lines = [line.rstrip() for line in lines]
# extrinsics: line [1,5), 4x4 matrix
extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4))
# intrinsics: line [7-10), 3x3 matrix
intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3))
intrinsics[:2, :] /= 4
# CHANGE K ACCORDING TO SIZE!
intrinsics[0] *= 1152/1600
intrinsics[1] *= 864/1200
###############################
# depth_min & depth_interval: line 11
depth_min = float(lines[11].split()[0])
depth_interval = float(lines[11].split()[1]) * self.interval_scale
return intrinsics, extrinsics, depth_min, depth_interval
def read_img(self, filename):
img = Image.open(filename)
# RESIZE IMAGE
img = img.resize((1152, 864), Image.BILINEAR)
# scale 0~255 to 0~1
np_img = np.array(img, dtype=np.float32) / 255.
return np_img
Also, you need to change the code in eval.py
:
L56
intrinsics[0] *= 1152/1600
intrinsics[1] *= 864/1200
L60
def read_img(filename):
img = Image.open(filename).resize((1152, 864), Image.BILINEAR)
L268
color = ref_img[::4, ::4, :][valid_points] # hardcoded for DTU dataset
@kwea123 hi, do you know why the intrinsics should be “intrinsics[:2, :] /= 4” in read_cam_file in eval mode. But it doesn't need to “intrinsics[:2, :] /= 4” in read_cam_file in train mode. See your code of train mode here. Is that the cause of the dataset?
It depends on which data it uses. For training the intrinsic numbers are already preprocessed to match the size (160, 128) so there is no need for dividing. In test data however, the intrinsics is the original number, which is for size (1600, 1200), so to match the output of the network (which is 1/4 of original scale since there are two stride=2 downsample in the CNN), we need to divide the intrinsics by 4
@kwea123 Hi! When I use your method to resize the input image to solve OOM in evalution, it occurs a problem as following:
Traceback (most recent call last):
File "eval.py", line 314, in <module>
save_depth()
File "eval.py", line 118, in save_depth
for batch_idx, sample in enumerate(TestImgLoader):
File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/diskC/hzc/MVSNet_pytorch-master/datasets/dtu_yao_eval.py", line 93, in __getitem__
imgs.append(self.read_img(img_filename))
File "/diskC/hzc/MVSNet_pytorch-master/datasets/dtu_yao_eval.py", line 68, in read_img
assert np_img.shape[:2] == (1152, 864)
AssertionError
Do you know how to solve it?
@kwea123 Hi! When I use your method to resize the input image to solve OOM in evalution, it occurs a problem as following:
Traceback (most recent call last): File "eval.py", line 314, in <module> save_depth() File "eval.py", line 118, in save_depth for batch_idx, sample in enumerate(TestImgLoader): File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) AssertionError: Caught AssertionError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/data1/hzc/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/diskC/hzc/MVSNet_pytorch-master/datasets/dtu_yao_eval.py", line 93, in __getitem__ imgs.append(self.read_img(img_filename)) File "/diskC/hzc/MVSNet_pytorch-master/datasets/dtu_yao_eval.py", line 68, in read_img assert np_img.shape[:2] == (1152, 864) AssertionError
Do you know how to solve it?
Notice that images will be downsized in feature extraction, plus the four- scale encoder-decoder structure in 3D regularization part, the input image size must be divisible by a factor of 32. Considering this requirement also the limited GPU memories, we downsize the image resolution from 1600×1200 to 800×600, and then crop the image patch with W = 640 and H = 512 from the center as the training input. The input camera parameters are changed accordingly.
@agenthong just remove that assertion... assert np_img.shape[:2] == (1152, 864)
the shape is actually (864, 1152) but you don't need that assertion anyway.
Also, it seems that the author has abandoned this repo and didn't respond anymore; I have no interest in debugging others' code in details, so my response to this thread will end here. I have my own implementation, feel free to contact me if there's any bug though, thank you.
@kwea123 Got it. Thanks anyway.
@whubaichuan I've already downsized the image to (640,512) but still got OOM in evaluation. RuntimeError: CUDA out of memory. Tried to allocate 2.71 GiB (GPU 0; 7.77 GiB total capacity; 6.45 GiB already allocated; 424.50 MiB free; 6.56 GiB reserved in total by PyTorch) He said it only costs around 6.8G in his way
it occurs a problem as following:
Do you know how to solve it?@whubaichuan
@whubaichuan sorry, I don't have wechat. I think all you need is paste a few lines in mvsnet.py . If there still having OOM problem please delete already finish array to release the memory.
To sparse further the memory space, one can delete volume_sq_sum
and volume_sum
as well after the volume_variance
is computed. After this change, it works for my 11G GPU without resizing the image. Hope it may help one.
@tiffany61706 Hi But if i use the image(512,640), it comes errors as “AssertionError:assert np_img.shape[:2] == (1200, 1600) in /MVSNet_pytorch/datasets/dtu_yao_eval.py", line 63, in read_img”
And after i change the code to assert np_img.shape[:2] == (512, 640), it comes the error as "RuntimeError:The size of tensor a (31) must match the size of tensor b (32) at non-singleton dimension 3 in /MVSNet_pytorch/models/mvsnet.py".
So I think it is not a good idea to changed the size of input image.
File "eval.py", line 307, in save_depth()
File "eval.py", line 118, in save_depth outputs = model(sample_cuda["imgs"], sample_cuda["proj_matrices"], sample_cuda["depth_values"])
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, **kwargs)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs)
File "/home/amax/shenye/colmapTest1/MVSNet_pytorch-master/models/mvsnet.py", line 132, in forward cost_reg = self.cost_regularization(volume_variance)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs)
File "/home/amax/shenye/colmapTest1/MVSNet_pytorch-master/models/mvsnet.py", line 66, in forward x = conv4 + self.conv7(x)
RuntimeError: The size of tensor a (31) must match the size of tensor b (32) at non-singleton dimension 3`
Hello,I got the same problem with you,how do you deal this problem?
Original size
1600x1184
cause OOM on my 11GB GPU. I resized the image to1152x864
and it works (costs 6831MB). Don't forget to change the instrinsics as the following:def read_cam_file(self, filename): with open(filename) as f: lines = f.readlines() lines = [line.rstrip() for line in lines] # extrinsics: line [1,5), 4x4 matrix extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4)) # intrinsics: line [7-10), 3x3 matrix intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3)) intrinsics[:2, :] /= 4 # CHANGE K ACCORDING TO SIZE! intrinsics[0] *= 1152/1600 intrinsics[1] *= 864/1200 ############################### # depth_min & depth_interval: line 11 depth_min = float(lines[11].split()[0]) depth_interval = float(lines[11].split()[1]) * self.interval_scale return intrinsics, extrinsics, depth_min, depth_interval def read_img(self, filename): img = Image.open(filename) # RESIZE IMAGE img = img.resize((1152, 864), Image.BILINEAR) # scale 0~255 to 0~1 np_img = np.array(img, dtype=np.float32) / 255. return np_img
Also, you need to change the code in
eval.py
:L56 intrinsics[0] *= 1152/1600 intrinsics[1] *= 864/1200 L60 def read_img(filename): img = Image.open(filename).resize((1152, 864), Image.BILINEAR) L268 color = ref_img[::4, ::4, :][valid_points] # hardcoded for DTU dataset
Follow this guide, it finally work on my device(Tesla K80 11G)!!!🎉 And also, I have manually resize the images to 600*800, and change the camera instrinsics myself, but it failed. Besides, when finish depth map generated, it failed as below
File "eval.py", line 269, in filter_depth
color = ref_img[::4, ::4, :][valid_points]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 300 but corresponding boolean dimension is 216
The final .pfm depth map is nearly empth and cannot generate point cloud files.
@whubaichuan sorry, I don't have wechat. I think all you need is paste a few lines in mvsnet.py . If there still having OOM problem please delete already finish array to release the memory.
To sparse further the memory space, one can delete and as well after the is computed. After this change, it works for my 11G GPU without resizing the image. Hope it may help one.
volume_sq_sum``volume_sum``volume_variance
hello,I modified the code according to your method, but the following problems still occurred @Willyzw
Traceback (most recent call last):
File "/home/ly/Work/MVSNet_pytorch/train.py", line 276, in
@whubaichuan I'm sorry to bother you. Have you solved this problem? Can you tell me how to solve this problem?
Where should the code be changed to run on multiple cards?
Traceback (most recent call last):
File "/home/camellia/zyf/MVSNet_pytorch-master/eval.py", line 302, in
I have two 3060 cards. How can I solve this problem? thank you!
@whubaichuan I'm sorry to bother you. Have you solved this problem? Can you tell me how to solve this problem?
Did you solve this problem? How did you solve it?
@whubaichuan I've already downsized the image to (640,512) but still got OOM in evaluation. RuntimeError: CUDA out of memory. Tried to allocate 2.71 GiB (GPU 0; 7.77 GiB total capacity; 6.45 GiB already allocated; 424.50 MiB free; 6.56 GiB reserved in total by PyTorch) He said it only costs around 6.8G in his way
@agenthong please refer to my project MVSNet
it occurs a problem as following:
Do you know how to solve it?@whubaichuan
@ChenLiufeng please refer to my project MVSNet
@whubaichuan I'm sorry to bother you. Have you solved this problem? Can you tell me how to solve this problem?
@Innocence4822 please refer to my project MVSNet
@whubaichuan I'm sorry to bother you. Have you solved this problem? Can you tell me how to solve this problem?
Did you solve this problem? How did you solve it?
@zhao-you-fei please refer to my project MVSNet
@whubaichuan I'm sorry to bother you. Have you solved this problem? Can you tell me how to solve this problem?
Did you solve this problem? How did you solve it?
@zhao-you-fei please refer to my project MVSNet Changed mvsnet.py? Does this approach solve the memory problem?
@zhao-you-fei yes, solved
Hi,
Thanks for the excellent project! I use multiple RTX 2080 to run the code. However, the code causes OOM during evaluation (eval.sh). Since the batch size is 1 so it only uses a single GPU. Yet I could not figure out why it doesnt cause OOM during training.
Can you give an example that which kind of GPU is enough for testing? Thanks!