sxyu / svox2

Plenoxels: Radiance Fields without Neural Networks
BSD 2-Clause "Simplified" License
2.79k stars 360 forks source link

cam_scale_factor parameter #91

Open povolann opened 1 year ago

povolann commented 1 year ago

Hello, I tried to search and looked in the code, but I am kind of lost. What exactly is the cam_scale_factor? Thank you for your answer!

Learningm commented 1 year ago

Hi, did you figure out what this parameter is?

povolann commented 1 year ago

Hi, not really, but I didn't play around with it.

Learningm commented 1 year ago

Got it. BTW, have you met the "floaters" problem when trying your custom_data ?

povolann commented 1 year ago

Yes, I still have floaters, but the results are pretty good for my purpose. Especially when I played around with the parameters: lambda_tv, lambda_tv_sh, and background_brightness.

Learningm commented 1 year ago

Thanks! I will try your suggestions.

reconlabs-sergio commented 1 year ago

In most NeRFs, you need to make sure your region of interest is in a "controlled region" of space. You cannot really predict where colmap will put your poses. Therefore, most algorithms (1) recenter the poses (e.g. they compute the average of all poses and assume this the center, or, for non-centered captures they compute some form of "look-at" point) and (2) rescale all the poses to a certain radius. The cam_rescale factor refers to that. It's mentioned in section 3.6 of the paper: "we pre-scale the inner scene to be approximately contained in the unit sphere".

Learningm commented 1 year ago

I got another blurry problem, the surface on my custom object is blur. Have you met similar issues?

The 'floaters' seems to be better when I tune the parameters mentioned above.

povolann commented 1 year ago

Hm, depends what kind of blurry problem. Can you share an example?

Learningm commented 1 year ago

Hi, as following, the object surface are so blurry.

https://user-images.githubusercontent.com/13192241/199918064-e2490803-6f67-4526-9df4-51eb3c516c04.mp4

Learningm commented 1 year ago

In most NeRFs, you need to make sure your region of interest is in a "controlled region" of space. You cannot really predict where colmap will put your poses. Therefore, most algorithms (1) recenter the poses (e.g. they compute the average of all poses and assume this the center, or, for non-centered captures they compute some form of "look-at" point) and (2) rescale all the poses to a certain radius. The cam_rescale factor refers to that. It's mentioned in section 3.6 of the paper: "we pre-scale the inner scene to be approximately contained in the unit sphere".

Cool! I got another followup question, why should we set the "cam_scale_factor" ? In this implementation, it's set to be 0.9 or 0.95, after computing the average of all poses, there is already a scale, using the default nvsf dataset format. If we set this "cam_scale_factor" to 1.0, it seems nothing special, is that right ?

reconlabs-sergio commented 1 year ago

I haven't checked the code in detail, but I assume they just make the poses "a tiny bit" smaller than the unit sphere, to be safe. I don't think it's a big deal. What might be important is to take this number into account if you have any "scale-dependent" components in your pipeline. Like, if you are wanting to make a physically accurate object, or something like that.

reconlabs-sergio commented 1 year ago

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

povolann commented 1 year ago

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

I also think this might be because of the inaccurate poses. I tried to switch poses for 2 similar datasets and got similar rendering results.

But in this regard, is there any advice on how to change the Colmap parameters in case this happens, but I can't change/retake my photos? Or how to change the parameters for Colmap in case it detects fewer photos than I have in the dataset?

Learningm commented 1 year ago

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

No, I haven't tried the manual feature patterns during taking photos.
I double checked the features extracted by colmap and visualize, the object in the video has few feature points on its smooth and white surface. Inaccurate poses seems to be the reason. As @povolann mentioned, how can I get a more accurate camera pose using Colmap without retaking my photos?

Learningm commented 1 year ago

I haven't checked the code in detail, but I assume they just make the poses "a tiny bit" smaller than the unit sphere, to be safe. I don't think it's a big deal. What might be important is to take this number into account if you have any "scale-dependent" components in your pipeline. Like, if you are wanting to make a physically accurate object, or something like that.

I see. I got another question, can we adjust the "control region" in this repo like others with "aabb scale"? I am checking the code in detail and want to adjust the "control region"(SparseGrid in this repo) to a smaller region so that it may focus more on the foreground. But the SparseGrid seems to be normalized already(center[0,0,0], radius[1,1,1]), that's a [-1, 1] bounding box, covering the whole scene.

reconlabs-sergio commented 1 year ago

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

No, I haven't tried the manual feature patterns during taking photos. I double checked the features extracted by colmap and visualize, the object in the video has few feature points on its smooth and white surface. Inaccurate poses seems to be the reason. As @povolann mentioned, how can I get a more accurate camera pose using Colmap without retaking my photos?

Only thing I can think of is:

But anyway, your results will probably be bound to the garbage-in -> garbage-out motto if your original images don't have enough quality. Otherwise, depending on what's your purpose, you may want to try another algorithm (NerFactor, INGP, NerfStudio)

I don't have enough experience with this code to answer your AABB question. What you say sounds reasonable, have you also tried increasing the grid resolution?

Learningm commented 1 year ago

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

No, I haven't tried the manual feature patterns during taking photos. I double checked the features extracted by colmap and visualize, the object in the video has few feature points on its smooth and white surface. Inaccurate poses seems to be the reason. As @povolann mentioned, how can I get a more accurate camera pose using Colmap without retaking my photos?

Only thing I can think of is:

  • Trying to evaluate why your poses are bad: are your pictures blurry? Are they downsampled? If so, maybe try some AI-based upsampling/deblurring algorithm, but it's kind of a long shot.
  • Using commercial software to compute the poses (e.g. RealityCapture, etc...)

But anyway, your results will probably be bound to the garbage-in -> garbage-out motto if your original images don't have enough quality. Otherwise, depending on what's your purpose, you may want to try another algorithm (NerFactor, INGP, NerfStudio)

I don't have enough experience with this code to answer your AABB question. What you say sounds reasonable, have you also tried increasing the grid resolution?

Thanks for your quick reply. Pose should be evaluated and checked if accurate enough using some other methods. Let me think it later.

I tried increasing the grid resolution, but in my opinion, the grid resolution means the bounding box covering the whole scene, [128, 128, 128] or [256, 256, 256] just means to divide the whole scene into 128 cubic or 256 cubic small grid. It doesn't have relationship between the "control region" (the center object).

As say, the center object may occupy just small proportion grids, when the whole scene consists 128 cubic(more than 1 million) grids, the object may occupy 10k grids or so. It seems there doesn't exist a way to visualize the voxel occupied situation like octree. I wonder whether I get the correct understanding or not.

Wuziyi616 commented 1 year ago

@Learningm @povolann sorry to bother you. I also encountered lots of floaters when training on my custom data. I tried the parameters you mentioned above, they help, but still not very good. So I wonder besides the lambda_tv parameters, have you tried lambda_beta and lambda_sparsity? Cause they seem more related to floaters, as they are directly pushing sigma to either 0 or 1.

Learningm commented 1 year ago

@Learningm @povolann sorry to bother you. I also encountered lots of floaters when training on my custom data. I tried the parameters you mentioned above, they help, but still not very good. So I wonder besides the lambda_tv parameters, have you tried lambda_beta and lambda_sparsity? Cause they seem more related to floaters, as they are directly pushing sigma to either 0 or 1.

I have tried lambda_beta and lambda_sparsity, since the default parameters are already small, I guess they should be bigger and make effect on the foreground & background part, however, 10x or 100x the default parameter always get worse results.

Wuziyi616 commented 1 year ago

@Learningm Thanks for your reply! Interesting, I also feel that 1e-5, 1e-11 are too small, so maybe I should increase them. But it seems that in your experiments increasing them didn't lead to better results? That's confusing hummm

Learningm commented 1 year ago

@Wuziyi616 I got blurry edge problem as the video posted above and it seems hard to solve through tuning parameters. Did you get some good results without the blurry problem on your data?

Wuziyi616 commented 1 year ago

I'm training on CO3D which has good camera poses. So my results of object surface is all good, just some floaters

https://user-images.githubusercontent.com/37072215/201841878-1aac4ead-527c-4cbe-9663-0764b3782faa.mp4

Learningm commented 1 year ago

@Wuziyi616 Your result looks good. I got similar result after tuning the parameters mentioned above.

Wuziyi616 commented 1 year ago

Hi @Learningm , I use this codebase. They've tuned the parameters very well. I think compared to the co3d setting here, they increase the sparsity_loss weight by 10x.

Learningm commented 1 year ago

@Wuziyi616 Cool! Thanks for sharing.

sxyu commented 1 year ago

Yeah sorry I have not maintained this codebase very much at all and will try to do some things when I have time. This parameter simply directly scales the overall scene, on top of the normalization method. While our background model allows for modelling unbounded scene, it still matters which portion of the scene is in the scene