princeton-computational-imaging / NSF

Official code repository for the paper: "Neural Spline Fields for Burst Image Fusion and Layer Separation"
MIT License
272 stars 13 forks source link

Testing using section 3 in the colab toturial #8

Open mohiiieldin opened 1 month ago

mohiiieldin commented 1 month ago

I tried running section 3 on this image: 2017_Train_00010

and duplicated it 5 times to match the number of examples in the tutorial, I trained for 50 epoch and this is the final output:

NFS-output

all the pixels = 1 in transmission, reference, and obstruction: image

Ilya-Muromets commented 1 month ago

If you just duplicate the image then there is no paralllax (motion between frames) for obstruction removal, so the method will not work. You will need to sample from a video with some hand or scene motion in it.

mohiiieldin commented 1 month ago

@Ilya-Muromets Thanks for you reply,

If i sampled frames using opencv from a normal video not taken by your android app will it work?

Also when sampling what's your recommendation to the number of frames and the interval between each sampled frame?

Ilya-Muromets commented 1 month ago

Yeah that should work; in general the approach is tuned towards handheld burst photography settings, where you have ~1cm of camera motion. If you look at the preview videos on our website: https://light.princeton.edu/publication/nsf/ it should give you a feel of how much motion there is in our captures.

mohiiieldin commented 1 month ago

Okay got it, will try and see.

Thanks for the reply

mohiiieldin commented 1 month ago

Did you open source your data set thay you used for training?

Ilya-Muromets commented 1 month ago

You can download scenes from the links in the README. There's no "training data" per se, since the network is trained from scratch for every scene (similar to a NeRF).

mohiiieldin commented 1 month ago

Thanks 🙏

mohiiieldin commented 1 month ago

I tested using the following video that I shot using my mobile camera:

https://github.com/princeton-computational-imaging/NSF/assets/43294096/972fb9d5-c997-4db9-a31e-a3eae06d6c5b

and these are the sampled frames from the videos using opencv:

image

now I started seeing results but they are not as good as in the demo:

image

Do you have any thoughts about what can be improved?

mohiiieldin commented 1 month ago

I also tried to fit the model on 5 images using: WhatsApp Image 2024-05-13 at 17 09 31_63cafe3b

but got a white image the same as normal images, so isn't this burst feature in the mobile camera the one you meant?

Ilya-Muromets commented 1 month ago

Nice, vid1.mp4 looks closer to the intended data. The motion there is still very large, however. You can try just recording natural hand motion (i.e. try to keep your hand still while recording the video).

mohiiieldin commented 1 month ago

Tried with another video with less motion on it (just natural hand motion):

https://github.com/princeton-computational-imaging/NSF/assets/43294096/86000e30-413b-4031-b37b-b1b73dac1bcb

but the output was pretty same like the input: image

output is the right image

Ilya-Muromets commented 1 month ago

Interesting, try passing in the first 30 frames of that? I think this kind of occluder should work pretty well, but there's definitely room for improvement

mohiiieldin commented 1 month ago

I was passing 5 frames with systematic sampling, I will try 30 frames now but should I use occlusion.sh or occlusion-wild.sh ?

Ilya-Muromets commented 1 month ago

Can try both! Try first 30 frames, maybe will also need to set the size of the occlusion alpha to "large". You can find that in the config

mohiiieldin commented 1 month ago

I tried both with 30 frames and tried also to set the size of the occlusion alpha to "large", final result is better but still the fence is visible.

Should I play with the focus, make the focus behind the fence not on it for example, or this is not relevant?

image

Ilya-Muromets commented 1 month ago

Focus should be fine, the z-motion (camera moving towards the occluder) might be a bit difficult, but shouldn't be unreasonable to estimate.

I'm working towards a SIGGRAPH Asia submission so unfortunately can't really help debug too much right now, but am happy to chat about the method more at a later time. In general I encourage playing with the settings (e.g., small/med/large encodings for stuff, how many camera control points there are) and maybe try passing it even less motion (e.g., just the last 10 frames)

mohiiieldin commented 1 month ago

@Ilya-Muromets Thanks for your help.

When you are available for more debugging just ping me.