Closed grisaiakaziki closed 2 months ago
Hi @grisaiakaziki,
Thank you for your interest in our work!
Best, Marko
@markomih Thank you for your prompt response. Regarding question 3, I have tried this method before, but because the dimension of input (1) is too large, it leads to the 'out of memory' issue. Do you have any good solutions? I have also tried reshaping the dimensions of the input, which can solve the 'out of memory' problem, but the model's performance will decrease. Once again, thank you for your answer. Good luck.
@grisaiakaziki Does the regular MLP have the same OOM problem? If that's the case there’s not much to do except reduce the number of samples per iteration.
@markomih The ordinary MLP does not produce this OOM issue. After executing the line of code "return (weight @ input.permute(0, 2, 1) + self.bias.view(1, -1, 1)).permute(0, 2, 1)", the GPU memory usage will increase significantly.
@grisaiakaziki how do input_time
and frame_id
parameters look for you? The current implementation assumes that one batch contains samples from a single frame/time step
@markomih I apologize for the late reply. The shape of 'input' is (1, 24677, 256), the shape of 'input_time' is [160], and the shape of 'frame_id' is [160]. Additionally, 'input' has been detached from the computation graph before being used as the network's input. Thank you for your careful response.
@grisaiakaziki That could be the problem. The frame_id
tensor should have the shape of batch size (in your case 1).
@markomih I apologize for disturbing you again. Do you have time to elaborate on the relationship between capacity, input_time, frame_id, and batch? In my understanding, capacity is the number of training images, input_time is the corresponding time for the former, and frame_id is the number of the picture. Which parameters need to have the same dimensions? For example, you mentioned before that the dimensions of frame_id and batch should be the same. In my experiments, I found that the model performs poorly on the unknown perspectives in the training set. I am looking forward to your reply, thank you.
@grisaiakaziki
input_time
is just normalized frame_id
and will be used only if you specify mode == 'interpolation'
.
The code assumes that all samples in a batch (of input
tensor) come from a single frame/image, hence the frame_id
tensor just contains the index of your frame/image.
For your particular input
with 24677 samples and batch size of 1 (1, 24677, 256), all 24677 samples must come from a single image, where frame_id
contains the image/frame id with the shape of (1,).
@markomih Thank you for your patient replies, and I hope you will produce better work in the future.
Dear Author,
First of all, congratulations on completing such an excellent piece of work. After reading your paper, I have a few points that are not very clear, and I would appreciate it if you could help me understand them.
Question One: In the appendix of your paper, you mentioned that "we set the number of coefficients Ti to the number of frames unless specified otherwise." What impact does the setting of Ti have?
Question Two: In the code, in the file resfield.py, in the forward function, the shape of the input is (B, S, F_in). How should I understand this S, it seem in siren the S is the number of pixels?
Question Three: Following up on the previous question, in my input, the input does not have the dimension B. Have you encountered such an input? How should I resolve this?
Thank you, I look forward to your reply.