kwea123 / nerf_pl

NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
https://www.youtube.com/playlist?list=PLDV2CyUo4q-K02pNEyDr7DYpTQuka3mbV
MIT License
2.74k stars 483 forks source link

ShapeNet dataset configuration #31

Closed nitthilan closed 4 years ago

nitthilan commented 4 years ago

Hi, Did you test the code with ShapeNet dataset? If so what are the pre-processing steps done to get good results?

Thanking you.

Regards, K. J. Nitthilan

kwea123 commented 4 years ago

No I didn't. What do you mean by pre-processing? I think the only thing you need to do is to know the camera poses.

nitthilan commented 4 years ago

When the background is black, the network seems to overfit to zero and the PSNR gets stuck at a low value like 11. However, when I make the background color as grey it seems to increase the PSNR to around 25. How do you fix this and make it work for a black background too?

nitthilan commented 4 years ago

The outputs I get with 96 images trained for 100 epochs with grey background is as below: 96_images

The output I get with 96 images trained for 100 epochs with a black background. However, here I have to make the number of zero outputs the same as that of the non-zero values and was able to train the network. black_96_images

Not sure why I get these artifacts in both the outputs? Any thoughts?

Regards, K. J. Nitthilan

kwea123 commented 4 years ago

Can you try to manually change this line to True? Also change the image to WHITE background. https://github.com/kwea123/nerf_pl/blob/16684b40eb3a23df384c27cdec0db1d928cefa3c/datasets/llff.py#L174 It will consider the white background in the rendering process. This problem originates from the fact that you need to specify the background (where the rays don't collide with the object) color beforehand, which is tunable but not suggested here https://github.com/kwea123/nerf_pl/blob/16684b40eb3a23df384c27cdec0db1d928cefa3c/models/rendering.py#L169-L170 For example if you absolutely need a black background, you can change the 1 to 0 so that the model learns to output black color.

If you leave the self.white_back = False, it will consider that the object occupies the whole space, and tries to predict the colors even for empty space. Since background looks the same from every angle, the model gets confused about the true spatial composition of the object and yields artifacts. That's why we need to tell the model the background color beforehand and gives it a hint that these spaces don't actually contain anything.

nitthilan commented 4 years ago

I tried the same. However, I still see the color values learning all zeros and so the PSNR gets stuck at 11.

kwea123 commented 4 years ago

Can you share the data (images+poses) on google drive? I'll test

nitthilan commented 4 years ago

This drive all three combinations of a grey white and black background https://drive.google.com/drive/folders/1Zxth_6fwpUkpN9shh00UeabiLWYYC17D?usp=sharing

The poses are stored in render_airplane_2.pkl w, h = self.meta["width"], self.meta["height"] self.meta["focal"] pose = np.array(self.meta[frame])[:3,:]

kwea123 commented 4 years ago

How do you handle the bounds? Also did you verify that the poses are in the correct order (right up back)? It would be great if you can share the whole code of the dataloader so that I can check quickly.

nitthilan commented 4 years ago

PFA the dataloader files. shapenet1 is used for grey background while shapenet is used for black background shapenet_copy.txt shapenet1_copy.txt

kwea123 commented 4 years ago

Probably it's due to the excessive amount of background. In blender the foreground has 33% area but in your data it has only 20%, which makes the network only learn the background because it's more important...

There are two solutions in my opinion,

  1. https://github.com/bmild/nerf/issues/29 the author proposed to crop the central region and train on early stages, then train on the full image later.
  2. Similar idea but we can use valid_mask to train exactly on the object region on early stages, then train on the full image later.

I tried 2. and the result is quite good: airplane More precisely I train with mask for 13 epochs and without mask (full image) for 7 epochs with default learning rate and cosine annealing. Some numbers: after 13 epochs the training loss is 0.013, psnr 22.71, validation loss is 4e-3, psnr 26.95. After 20 epochs, training loss 3e-3 psnr 29.28, val loss 2e-3, psnr 29. I trained with image size 160x120 (half) so if you train on 320x240 I think it will be better.

dataloader I use for white background: remove valid_mask in lines 58 and 68 for later epochs.

import torch
from torch.utils.data import Dataset
import json
import numpy as np
import os
from PIL import Image
from torchvision import transforms as T

from .ray_utils import *

import pickle

class ShapeNetDataset(Dataset):
    def __init__(self, root_dir, split='train', object_type='airplane', obj_num=2, img_wh=(320, 240)):
        self.root_dir = root_dir
        self.split = split
        self.img_wh = img_wh
        self.define_transforms()

        self.read_meta(object_type, obj_num)
        self.white_back = True

        self.object_type = object_type
        self.obj_num = obj_num

    def read_meta(self, object_type, obj_num):
        with open(os.path.join(self.root_dir, 
                    "render_"+object_type+"_"+str(obj_num)+".pkl"), "rb") as f:
            self.meta = pickle.load(f)
        w, h = self.img_wh
        self.focal = 0.5*320/np.tan(0.5*self.meta["focal"]) # original focal length
                                                            # when W=320
        self.focal *= w/320

        self.near = 1.4
        self.far = 2.3
        self.bounds = np.array([self.near, self.far])

        # ray directions for all pixels, same for all images (same H, W, focal)
        self.directions = \
            get_ray_directions(h, w, self.focal) # (h, w, 3)

        self.image_paths = []
        self.all_rays = []
        self.all_rgbs = []
        for frame in range(1, 96):
            pose = np.array(self.meta[frame])[:3]
            c2w = torch.FloatTensor(pose)

            image_path = os.path.join(self.root_dir, "images/", 
                "render_"+object_type+"_"+str(obj_num)+"_"+str(frame)+".png")
            self.image_paths += [image_path]
            img = Image.open(image_path)
            img = img.resize(self.img_wh, Image.LANCZOS)
            img = self.transform(img) # (3, h, w)
            img = img.view(3, -1).permute(1, 0) # (h*w, 3) RGBA
            valid_mask = (img.sum(1)<3).flatten() # (H*W) valid color area
            self.all_rgbs += [img[valid_mask]] # remove valid_mask for later epochs

            check_output = torch.sum(img, axis=1)

            rays_o, rays_d = get_rays(self.directions, c2w) # both (h*w, 3)

            ray_array = torch.cat([rays_o, rays_d, 
                                   self.near*torch.ones_like(rays_o[:, :1]),
                                   self.far*torch.ones_like(rays_o[:, :1])],
                                   1) # (h*w, 8)
            self.all_rays += [ray_array[valid_mask]] # remove valid_mask for later epochs

        self.all_rays = torch.cat(self.all_rays, 0)
        self.all_rgbs = torch.cat(self.all_rgbs, 0)

    def define_transforms(self):
        self.transform = T.ToTensor()

    def __len__(self):
        if self.split == 'train':
            return len(self.all_rays)
        if self.split == 'val':
            return 1
        return 96

    def __getitem__(self, idx):
        if self.split == 'train': # use data in the buffers
            sample = {'rays': self.all_rays[idx],
                      'rgbs': self.all_rgbs[idx]}

        else: # create data for each image separately
            image_path = os.path.join(self.root_dir, "images/", 
                "render_"+self.object_type+"_"+str(self.obj_num)+"_"+str(idx)+".png")
            c2w = torch.FloatTensor(self.meta[idx])[:3]

            img = Image.open(image_path)
            img = img.resize(self.img_wh, Image.LANCZOS)
            img = self.transform(img) # (3, H, W)
            img = img.view(3, -1).permute(1, 0) # (H*W, 3) RGBA
            valid_mask = (img.sum(1)<3).flatten() # (H*W) valid color area

            rays_o, rays_d = get_rays(self.directions, c2w)

            rays = torch.cat([rays_o, rays_d, 
                              self.near*torch.ones_like(rays_o[:, :1]),
                              self.far*torch.ones_like(rays_o[:, :1])],
                              1) # (H*W, 8)

            sample = {'rays': rays,
                      'rgbs': img,
                      'c2w': c2w,
                      'valid_mask': valid_mask}

        return sample
nitthilan commented 4 years ago

Thanks a lot for the modified code. Can you share the configuration you used to run the code with too? Currently, I am using this and am not able to reproduce the results:

python train.py --dataset_name shapenet --root_dir ../data/shapenet_white/ --N_importance 64 --img_wh 320 240 --noise_std 0 --num_epochs 100 --batch_size 1024 --optimizer adam --lr 5e-4 --lr_scheduler cosine --decay_step 2 4 8 --decay_gamma 0.5 --exp_name white_96_images_mask --num_gpus 2

kwea123 commented 4 years ago

You need to run 2 stages

  1. With the above dataloader (with mask), run

    python train.py --dataset_name shapenet --root_dir ../data/shapenet_white/ \
    --N_importance 64 --img_wh 320 240 --noise_std 0 --num_epochs 20 \
    --batch_size 1024 --optimizer adam --lr 5e-4 --lr_scheduler cosine \
    --exp_name white_96_images_mask --num_gpus 2
  2. After it finishes, you'll get a checkpoint epoch=x.ckpt where x is the best epoch, in my case it was 13. Now remove valid_mask in lines 58 and 68 in the dataloader, then run

    python train.py --dataset_name shapenet --root_dir ../data/shapenet_white/ \
    --N_importance 64 --img_wh 320 240 --noise_std 0 --num_epochs 20 \
    --batch_size 1024 --optimizer adam --lr 5e-4 --lr_scheduler cosine \
    --exp_name white_96_images_mask --num_gpus 2 \
    --ckpt_path ckpts/epoch=x.ckpt

    We just added the argument --ckpt_path ckpts/epoch=x.ckpt and nothing else is changed. It will load the best epoch (the model weights and the optimizer learning rate, etc) and continue training from that epoch. Since I set --num_epochs 20 and it ended with epoch 13 in the first stage, it will continue training for 7 epochs for the second stage.

After that just evaluate with the best epoch in the second stage, in my case it was epoch 17.

nitthilan commented 4 years ago

Thanks a lot for sharing this info. However, looks like this configuration works only for the size of 160x120. When I try it for 320x240 the validation accuracy drops and it seems to overfit. The validation PSNR drops to 5.31 or loss stays at 0.308. The Training PSNR is hovering around 17-20 range.

nitthilan commented 4 years ago

Sure. Let me try the cropping idea too. The current problem seems to be a overfitting problem.

On Sun, Aug 9, 2020, 13:31 Phong Nguyen Ha notifications@github.com wrote:

I think when u guys use larger image then the ratio between background pixels and object pixels are not the same as in the smaller resolution. Also I think that the problem might become much more difficult if the number of white pixels increases along with the size of images.

Have u guys tried to train the model using cropped image first and then finetune it for the whole image later ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kwea123/nerf_pl/issues/31#issuecomment-671097575, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFSYZG4FCSRRTPJLFNLVN3R74BTBANCNFSM4PV3EKRA .

kwea123 commented 4 years ago

Indeed training on 320x240 is more difficult... If possible I would suggest make some modification to the code so that it can be trained to explicitly predict alpha=0 for background pixels.

Or another idea is, first train on 160x120, then fine tune on 320x240.. I think I'll leave the experiments to you. You are welcomed to report any findings.

nitthilan commented 4 years ago

I tried cropping the images. The data still had the same distribution as 320x240. So it did not help much. So I first trained with equal amounts of background and foreground pixels for initial 10-13 epochs and then trained for the entire image. I could reach a PSNR of 24 for validation and, the training PSNR reaches 30. white_96_images_crop

One half of the plane looks fine. The other half is corrupted.

Will experiment with the other ideas too.

nitthilan commented 4 years ago

I will close this issue for now.