Open kszpxxzmc opened 7 months ago
I think it's already out there. Pick an off-the-shelf text-to-image model, and get the images. Input the images to EscherNet to get novel views. Run NeuS to reconstruct 3D.
I get pose.npy from eval_eschernet.py 476-489 as below:
elif DATA_TYPE == "MVDream" or DATA_TYPE == "Text2Img": img_path = None azimuth, polar = angles_out[T_out_index] if CaPE_TYPE == "4DoF": pose_out.append(torch.tensor([np.deg2rad(polar), np.deg2rad(azimuth), 0., 0.])) elif CaPE_TYPE == "6DoF": pose = look_at(origin, xyzs[T_out_index], up) pose = np.linalg.inv(pose) pose[2, :] *= -1 pose_out.append(torch.from_numpy(get_pose(pose))) print(len(pose_out)) if len(pose_out) == 100: np.save('pose.npy', pose_out)
and use pose.npy in 3drecon/renderer.py 501 - 512 for mesh extract as below:
for index in range(self.num_images): pose_path = '/mnt/petrelfs/xxx/xxx/EscherNet/pose.npy' pose_all = np.load(pose_path) # print(pose_all) pose = pose_all[index][:3, :] # in blender self.poses.append(pose) theta, azimuth, radius = get_pose(pose) print(theta, azimuth, radius) self.azs.append(azimuth) self.els.append(theta) self.dists.append(radius)
and I get theta, azimuth, radius as below:
0.0007854] [-0.01] [1.5] [0.03220132] [-0.41] [1.5] [0.06361725] [-0.81] [1.5] [0.09503318] [-1.21] [1.5] [0.1264491] [-1.61] [1.5] [0.15786503] [-2.01] [1.5] [0.18928096] [-2.41] [1.5] [0.22069688] [-2.81] [1.5] [0.25211281] [3.07318531] [1.5] [0.28352874] [2.67318531] [1.5] [0.31494466] [2.27318531] [1.5] [0.34636059] [1.87318531] [1.5] [0.37777652] [1.47318531] [1.5] [0.40919244] [1.07318531] [1.5] [0.44060837] [0.67318531] [1.5] [0.4720243] [0.27318531] [1.5] [0.50344022] [-0.12681469] [1.5] [0.53485615] [-0.52681469] [1.5] [0.56627208] [-0.92681469] [1.5] [0.597688] [-1.32681469] [1.5] [0.62910393] [-1.72681469] [1.5] [0.66051986] [-2.12681469] [1.5] [0.69193578] [-2.52681469] [1.5] [0.72335171] [-2.92681469] [1.5] [0.75476764] [2.95637061] [1.5] [0.78618356] [2.55637061] [1.5] [0.81759949] [2.15637061] [1.5] [0.84901541] [1.75637061] [1.5] [0.88043134] [1.35637061] [1.5] [0.91184727] [0.95637061] [1.5] [0.94326319] [0.55637061] [1.5] [0.97467912] [0.15637061] [1.5] [1.00609505] [-0.24362939] [1.5] [1.03751097] [-0.64362939] [1.5] [1.0689269] [-1.04362939] [1.5] [1.10034283] [-1.44362939] [1.5] [1.13175875] [-1.84362939] [1.5] [1.16317468] [-2.24362939] [1.5] [1.19459061] [-2.64362939] [1.5] [1.22600653] [-3.04362939] [1.5] [1.25742246] [2.83955592] [1.5] [1.28883839] [2.43955592] [1.5] [1.32025431] [2.03955592] [1.5] [1.35167024] [1.63955592] [1.5] [1.38308617] [1.23955592] [1.5] [1.41450209] [0.83955592] [1.5] [1.44591802] [0.43955592] [1.5] [1.47733395] [0.03955592] [1.5] [1.50874987] [-0.36044408] [1.5] [1.5401658] [-0.76044408] [1.5] [1.57158172] [-1.16044408] [1.5] [1.60299765] [-1.56044408] [1.5] [1.63441358] [-1.96044408] [1.5] [1.6658295] [-2.36044408] [1.5] [1.69724543] [-2.76044408] [1.5] [1.72866136] [3.12274123] [1.5] [1.76007728] [2.72274123] [1.5] [1.79149321] [2.32274123] [1.5] [1.82290914] [1.92274123] [1.5] [1.85432506] [1.52274123] [1.5] [1.88574099] [1.12274123] [1.5] [1.91715692] [0.72274123] [1.5] [1.94857284] [0.32274123] [1.5] [1.97998877] [-0.07725877] [1.5] [2.0114047] [-0.47725877] [1.5] [2.04282062] [-0.87725877] [1.5] [2.07423655] [-1.27725877] [1.5] [2.10565248] [-1.67725877] [1.5] [2.1370684] [-2.07725877] [1.5] [2.16848433] [-2.47725877] [1.5] [2.19990026] [-2.87725877] [1.5] [2.23131618] [3.00592654] [1.5] [2.26273211] [2.60592654] [1.5] [2.29414804] [2.20592654] [1.5] [2.32556396] [1.80592654] [1.5] [2.35697989] [1.40592654] [1.5] [2.38839581] [1.00592654] [1.5] [2.41981174] [0.60592654] [1.5] [2.45122767] [0.20592654] [1.5] [2.48264359] [-0.19407346] [1.5] [2.51405952] [-0.59407346] [1.5] [2.54547545] [-0.99407346] [1.5] [2.57689137] [-1.39407346] [1.5] [2.6083073] [-1.79407346] [1.5] [2.63972323] [-2.19407346] [1.5] [2.67113915] [-2.59407346] [1.5] [2.70255508] [-2.99407346] [1.5] [2.73397101] [2.88911184] [1.5] [2.76538693] [2.48911184] [1.5] [2.79680286] [2.08911184] [1.5] [2.82821879] [1.68911184] [1.5] [2.85963471] [1.28911184] [1.5] [2.89105064] [0.88911184] [1.5] [2.92246657] [0.48911184] [1.5] [2.95388249] [0.08911184] [1.5] [2.98529842] [-0.31088816] [1.5] [3.01671435] [-0.71088816] [1.5] [3.04813027] [-1.11088816] [1.5] [3.0795462] [-1.51088816] [1.5] [3.11096213] [-1.91088816] [1.5]
However, I cannot reconstruct 3D object. Maybe you can answer how can I get pose.npy for N1M100 3D generation.
hey dear author,
thanks for this great work! I have a similar question about 3D reconstruction of from single image input.
I have tried 2 methods but so far didn't successed.
use data_type='Text2Img'
as you described above, and then eschernet generates multi-view according to poses created from get_archimedean_spiral
.
But in render, according to this line https://github.com/kxhit/EscherNet/blob/10b650492ba97b5104a3136de07d1a67f4ada458/3drecon/renderer/renderer.py#L494 the render will use fixed camera poses in GSO dataset. So, there is a mismatch between then generated multi-view images and NeuS render and causing bad reconstruction quality.
use 'data_type='GSO3D' and created a subfolder in similiar structure (for example 'bottle') in Data/GSO30/ and copied camera poses (*.npy from other cases). The generated images are already a mess since the camera poses are not perfect. I tried this because this way, the generation pipeline can have same cameras poses as 3D reconstruction.
Essentially, I want to achieve similiar results as the demo on project page locally with single image input. Can you maybe give some hints? Should I estimate the camera pose for using DUSt3R?
thanks!
Thanks @Dipan-Zhang Yeah, you need to modify the camera coodinates accordingly. In the 3drecon code, the poses are assumed in GSO settings by default, we didn't focus on 3d recon as there are many methods to do so. It should be easy to modify accordingly.
got it, thanks a lot for the tip ;)
when come?