Closed nitthilan closed 4 years ago
No I didn't. What do you mean by pre-processing? I think the only thing you need to do is to know the camera poses.
When the background is black, the network seems to overfit to zero and the PSNR gets stuck at a low value like 11. However, when I make the background color as grey it seems to increase the PSNR to around 25. How do you fix this and make it work for a black background too?
The outputs I get with 96 images trained for 100 epochs with grey background is as below:
The output I get with 96 images trained for 100 epochs with a black background. However, here I have to make the number of zero outputs the same as that of the non-zero values and was able to train the network.
Not sure why I get these artifacts in both the outputs? Any thoughts?
Regards, K. J. Nitthilan
Can you try to manually change this line to True? Also change the image to WHITE background. https://github.com/kwea123/nerf_pl/blob/16684b40eb3a23df384c27cdec0db1d928cefa3c/datasets/llff.py#L174 It will consider the white background in the rendering process. This problem originates from the fact that you need to specify the background (where the rays don't collide with the object) color beforehand, which is tunable but not suggested here https://github.com/kwea123/nerf_pl/blob/16684b40eb3a23df384c27cdec0db1d928cefa3c/models/rendering.py#L169-L170 For example if you absolutely need a black background, you can change the 1 to 0 so that the model learns to output black color.
If you leave the self.white_back = False
, it will consider that the object occupies the whole space, and tries to predict the colors even for empty space. Since background looks the same from every angle, the model gets confused about the true spatial composition of the object and yields artifacts. That's why we need to tell the model the background color beforehand and gives it a hint that these spaces don't actually contain anything.
I tried the same. However, I still see the color values learning all zeros and so the PSNR gets stuck at 11.
Can you share the data (images+poses) on google drive? I'll test
This drive all three combinations of a grey white and black background https://drive.google.com/drive/folders/1Zxth_6fwpUkpN9shh00UeabiLWYYC17D?usp=sharing
The poses are stored in render_airplane_2.pkl w, h = self.meta["width"], self.meta["height"] self.meta["focal"] pose = np.array(self.meta[frame])[:3,:]
How do you handle the bounds? Also did you verify that the poses are in the correct order (right up back)? It would be great if you can share the whole code of the dataloader so that I can check quickly.
PFA the dataloader files. shapenet1 is used for grey background while shapenet is used for black background shapenet_copy.txt shapenet1_copy.txt
Probably it's due to the excessive amount of background. In blender the foreground has 33% area but in your data it has only 20%, which makes the network only learn the background because it's more important...
There are two solutions in my opinion,
valid_mask
to train exactly on the object region on early stages, then train on the full image later.I tried 2. and the result is quite good: More precisely I train with mask for 13 epochs and without mask (full image) for 7 epochs with default learning rate and cosine annealing. Some numbers: after 13 epochs the training loss is 0.013, psnr 22.71, validation loss is 4e-3, psnr 26.95. After 20 epochs, training loss 3e-3 psnr 29.28, val loss 2e-3, psnr 29. I trained with image size 160x120 (half) so if you train on 320x240 I think it will be better.
dataloader I use for white background: remove valid_mask
in lines 58 and 68 for later epochs.
import torch
from torch.utils.data import Dataset
import json
import numpy as np
import os
from PIL import Image
from torchvision import transforms as T
from .ray_utils import *
import pickle
class ShapeNetDataset(Dataset):
def __init__(self, root_dir, split='train', object_type='airplane', obj_num=2, img_wh=(320, 240)):
self.root_dir = root_dir
self.split = split
self.img_wh = img_wh
self.define_transforms()
self.read_meta(object_type, obj_num)
self.white_back = True
self.object_type = object_type
self.obj_num = obj_num
def read_meta(self, object_type, obj_num):
with open(os.path.join(self.root_dir,
"render_"+object_type+"_"+str(obj_num)+".pkl"), "rb") as f:
self.meta = pickle.load(f)
w, h = self.img_wh
self.focal = 0.5*320/np.tan(0.5*self.meta["focal"]) # original focal length
# when W=320
self.focal *= w/320
self.near = 1.4
self.far = 2.3
self.bounds = np.array([self.near, self.far])
# ray directions for all pixels, same for all images (same H, W, focal)
self.directions = \
get_ray_directions(h, w, self.focal) # (h, w, 3)
self.image_paths = []
self.all_rays = []
self.all_rgbs = []
for frame in range(1, 96):
pose = np.array(self.meta[frame])[:3]
c2w = torch.FloatTensor(pose)
image_path = os.path.join(self.root_dir, "images/",
"render_"+object_type+"_"+str(obj_num)+"_"+str(frame)+".png")
self.image_paths += [image_path]
img = Image.open(image_path)
img = img.resize(self.img_wh, Image.LANCZOS)
img = self.transform(img) # (3, h, w)
img = img.view(3, -1).permute(1, 0) # (h*w, 3) RGBA
valid_mask = (img.sum(1)<3).flatten() # (H*W) valid color area
self.all_rgbs += [img[valid_mask]] # remove valid_mask for later epochs
check_output = torch.sum(img, axis=1)
rays_o, rays_d = get_rays(self.directions, c2w) # both (h*w, 3)
ray_array = torch.cat([rays_o, rays_d,
self.near*torch.ones_like(rays_o[:, :1]),
self.far*torch.ones_like(rays_o[:, :1])],
1) # (h*w, 8)
self.all_rays += [ray_array[valid_mask]] # remove valid_mask for later epochs
self.all_rays = torch.cat(self.all_rays, 0)
self.all_rgbs = torch.cat(self.all_rgbs, 0)
def define_transforms(self):
self.transform = T.ToTensor()
def __len__(self):
if self.split == 'train':
return len(self.all_rays)
if self.split == 'val':
return 1
return 96
def __getitem__(self, idx):
if self.split == 'train': # use data in the buffers
sample = {'rays': self.all_rays[idx],
'rgbs': self.all_rgbs[idx]}
else: # create data for each image separately
image_path = os.path.join(self.root_dir, "images/",
"render_"+self.object_type+"_"+str(self.obj_num)+"_"+str(idx)+".png")
c2w = torch.FloatTensor(self.meta[idx])[:3]
img = Image.open(image_path)
img = img.resize(self.img_wh, Image.LANCZOS)
img = self.transform(img) # (3, H, W)
img = img.view(3, -1).permute(1, 0) # (H*W, 3) RGBA
valid_mask = (img.sum(1)<3).flatten() # (H*W) valid color area
rays_o, rays_d = get_rays(self.directions, c2w)
rays = torch.cat([rays_o, rays_d,
self.near*torch.ones_like(rays_o[:, :1]),
self.far*torch.ones_like(rays_o[:, :1])],
1) # (H*W, 8)
sample = {'rays': rays,
'rgbs': img,
'c2w': c2w,
'valid_mask': valid_mask}
return sample
Thanks a lot for the modified code. Can you share the configuration you used to run the code with too? Currently, I am using this and am not able to reproduce the results:
python train.py --dataset_name shapenet --root_dir ../data/shapenet_white/ --N_importance 64 --img_wh 320 240 --noise_std 0 --num_epochs 100 --batch_size 1024 --optimizer adam --lr 5e-4 --lr_scheduler cosine --decay_step 2 4 8 --decay_gamma 0.5 --exp_name white_96_images_mask --num_gpus 2
You need to run 2 stages
With the above dataloader (with mask), run
python train.py --dataset_name shapenet --root_dir ../data/shapenet_white/ \
--N_importance 64 --img_wh 320 240 --noise_std 0 --num_epochs 20 \
--batch_size 1024 --optimizer adam --lr 5e-4 --lr_scheduler cosine \
--exp_name white_96_images_mask --num_gpus 2
After it finishes, you'll get a checkpoint epoch=x.ckpt
where x
is the best epoch, in my case it was 13. Now remove valid_mask
in lines 58 and 68 in the dataloader, then run
python train.py --dataset_name shapenet --root_dir ../data/shapenet_white/ \
--N_importance 64 --img_wh 320 240 --noise_std 0 --num_epochs 20 \
--batch_size 1024 --optimizer adam --lr 5e-4 --lr_scheduler cosine \
--exp_name white_96_images_mask --num_gpus 2 \
--ckpt_path ckpts/epoch=x.ckpt
We just added the argument --ckpt_path ckpts/epoch=x.ckpt
and nothing else is changed. It will load the best epoch (the model weights and the optimizer learning rate, etc) and continue training from that epoch. Since I set --num_epochs 20
and it ended with epoch 13 in the first stage, it will continue training for 7 epochs for the second stage.
After that just evaluate with the best epoch in the second stage, in my case it was epoch 17.
Thanks a lot for sharing this info. However, looks like this configuration works only for the size of 160x120. When I try it for 320x240 the validation accuracy drops and it seems to overfit. The validation PSNR drops to 5.31 or loss stays at 0.308. The Training PSNR is hovering around 17-20 range.
Sure. Let me try the cropping idea too. The current problem seems to be a overfitting problem.
On Sun, Aug 9, 2020, 13:31 Phong Nguyen Ha notifications@github.com wrote:
I think when u guys use larger image then the ratio between background pixels and object pixels are not the same as in the smaller resolution. Also I think that the problem might become much more difficult if the number of white pixels increases along with the size of images.
Have u guys tried to train the model using cropped image first and then finetune it for the whole image later ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kwea123/nerf_pl/issues/31#issuecomment-671097575, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFSYZG4FCSRRTPJLFNLVN3R74BTBANCNFSM4PV3EKRA .
Indeed training on 320x240 is more difficult... If possible I would suggest make some modification to the code so that it can be trained to explicitly predict alpha=0 for background pixels.
Or another idea is, first train on 160x120, then fine tune on 320x240.. I think I'll leave the experiments to you. You are welcomed to report any findings.
I tried cropping the images. The data still had the same distribution as 320x240. So it did not help much. So I first trained with equal amounts of background and foreground pixels for initial 10-13 epochs and then trained for the entire image. I could reach a PSNR of 24 for validation and, the training PSNR reaches 30.
One half of the plane looks fine. The other half is corrupted.
Will experiment with the other ideas too.
I will close this issue for now.
Hi, Did you test the code with ShapeNet dataset? If so what are the pre-processing steps done to get good results?
Thanking you.
Regards, K. J. Nitthilan