A parameter question - Githubissues

markomih / ResFields

[ICLR 2024 Spotlight ✨] ResFields: Residual Neural Fields for Spatiotemporal Signals

https://markomih.github.io/ResFields

MIT License

159 stars 11 forks source link

A parameter question #9

Closed grisaiakaziki closed 2 months ago

grisaiakaziki commented 2 months ago

Dear Author,

First of all, congratulations on completing such an excellent piece of work. After reading your paper, I have a few points that are not very clear, and I would appreciate it if you could help me understand them.

Question One: In the appendix of your paper, you mentioned that "we set the number of coefficients Ti to the number of frames unless specified otherwise." What impact does the setting of Ti have?

Question Two: In the code, in the file resfield.py, in the forward function, the shape of the input is (B, S, F_in). How should I understand this S, it seem in siren the S is the number of pixels?

Question Three: Following up on the previous question, in my input, the input does not have the dimension B. Have you encountered such an input? How should I resolve this?

Thank you, I look forward to your reply.

markomih commented 2 months ago

Hi @grisaiakaziki,

Thank you for your interest in our work!

We have conducted a small experiment in the sup. mat. (Tab. A.2) to try to understand this parameter. As we were mostly interested in the dynamic reconstruction that usually doesn’t require time interpolation, we haven’t extensively experimented with this parameter.
(B, S, F_in): B is the batch size, S is the total number of samples in the batch (e.g. the number of pixels in the video example or the number of point samples in the NeRF experiments)
You could temporarily add one extra dimension for B (e.g., by torch.unsqueeze(input, dim=0) and later torch.squeeze(output, dim=0)).

Best, Marko

grisaiakaziki commented 2 months ago

@markomih Thank you for your prompt response. Regarding question 3, I have tried this method before, but because the dimension of input (1) is too large, it leads to the 'out of memory' issue. Do you have any good solutions? I have also tried reshaping the dimensions of the input, which can solve the 'out of memory' problem, but the model's performance will decrease. Once again, thank you for your answer. Good luck.

markomih commented 2 months ago

@grisaiakaziki Does the regular MLP have the same OOM problem? If that's the case there’s not much to do except reduce the number of samples per iteration.

grisaiakaziki commented 2 months ago

@markomih The ordinary MLP does not produce this OOM issue. After executing the line of code "return (weight @ input.permute(0, 2, 1) + self.bias.view(1, -1, 1)).permute(0, 2, 1)", the GPU memory usage will increase significantly.

markomih commented 2 months ago

@grisaiakaziki how do input_time and frame_id parameters look for you? The current implementation assumes that one batch contains samples from a single frame/time step

grisaiakaziki commented 2 months ago

@markomih I apologize for the late reply. The shape of 'input' is (1, 24677, 256), the shape of 'input_time' is [160], and the shape of 'frame_id' is [160]. Additionally, 'input' has been detached from the computation graph before being used as the network's input. Thank you for your careful response.

markomih commented 2 months ago

@grisaiakaziki That could be the problem. The frame_id tensor should have the shape of batch size (in your case 1).

grisaiakaziki commented 2 months ago

@markomih I apologize for disturbing you again. Do you have time to elaborate on the relationship between capacity, input_time, frame_id, and batch? In my understanding, capacity is the number of training images, input_time is the corresponding time for the former, and frame_id is the number of the picture. Which parameters need to have the same dimensions? For example, you mentioned before that the dimensions of frame_id and batch should be the same. In my experiments, I found that the model performs poorly on the unknown perspectives in the training set. I am looking forward to your reply, thank you.

markomih commented 2 months ago

@grisaiakaziki input_time is just normalized frame_id and will be used only if you specify mode == 'interpolation'.

The code assumes that all samples in a batch (of input tensor) come from a single frame/image, hence the frame_id tensor just contains the index of your frame/image.

For your particular input with 24677 samples and batch size of 1 (1, 24677, 256), all 24677 samples must come from a single image, where frame_id contains the image/frame id with the shape of (1,).

grisaiakaziki commented 2 months ago

@markomih Thank you for your patient replies, and I hope you will produce better work in the future.