Closed agelosk closed 1 week ago
Hello, thank you for your interest in our work. Your conclusions are absolutely correct. Regarding your questions:
Recently my colleague has provided a script for data processing on custom videos, please refer to https://github.com/mikeqzy/3dgs-avatar-release/issues/11#issuecomment-2308474585. Hope it helps!
Thanks for your reply. The code provided by your colleague is indeed very useful. Few follow up questions:
R = quaternion_to_matrix(qw,qx,qy,qz)
and T = - R.T * [tx,ty,tz]
where qw,qx,qy,qz,tx,ty,tz are directly from images.txt.q1[0] = 1. # [1,0,0,0] represents identity rotation. IndexError: index 0 is out of bounds for dimension 0 with size 0
This is regarding the Neuman dataset. Obviously there is something wrong with my K,R,T inputs. When I am using 3 to initialize K,R,T the model does train but I get very poor results identifying that still something is wrong. Can you enlighten me with how I should initialize K,R,T from COLMAP?
Appreciate your time, Agelos
By using the default camera parameter from 3 I believe the model will behave normally on Neuman dataset. Hope it helps!
Hello and thank you for your work.
I am trying to run your method in a custom dataset and I am trying to see what is the minimum data requirements. Given a dataset structurally similar to ZJU-MoCap your method can run on it directly after this preprocess script. Thus, what is needed for this script to run is enough to run your method too.
After studying the preprocessing script, I concluded it needs the following to run:
Thus, if we are just given an RGB video and we want to run your method, we should 1) run a masking method to produce the human mask per frame, 2) run a method to give us intrinsic and extrinsic per camera (most probably COLMAP), 3) run a 3D human reconstruction method to get poses, shapes and Rh, Th per pose, and 4) format the data properly to run the preprocess script and then run your train.py file.
Hopefully, that sums up the procedure correctly. I have two questions, though: 1) What is D and how do we obtain this? 2) How do we obtain 'poses', 'Rh', 'Th' and 'shapes' per pose? Does SPIN predict this information per pose?
Thank you for your work and time, Agelos