Open alandonald opened 2 years ago
You need firstly calibrate your cameras and obtain both extrinsic and intrinsic parameters.
Our method can adapt to any camera numbers or positions.
You need firstly calibrate your cameras and obtain both extrinsic and intrinsic parameters.
Our method can adapt to any camera numbers or positions.
Thank you for your reply I have extrinsic and intrinsic parameters after calibration. Can I use the provided pretrained models, such as occlusion_person_8view.pth.tar, h36m_4view.pth.tar, for a 6-view camera system without any training? Or should I retrain the model?
@alandonald Yes, you can directly use these pretrained models.
However, I would suggest that you use 2D pose models trained on COCO or your own dataset, and concatenate it with our crossview fusion model parameters.
To be able to report 3D MPJPE metric, we have to align the keypoint positions with these datasets' joint definition. The variance for scene and human are rather small in these datasets, so the 2D pose network might not generalize well on other scenarios.
The fusion model parameters can be easily extracted from our pretrained models, just use torch.load() and then pick the 3D fusion model parameters by name.
hi, now i have two sets of image sequences and the intrinsic parameters of two cameras and extrinsic parameters between them, how can I use your code to get absolute 3D joints coordinate? Could you tell me what should I do?
You need firstly calibrate your cameras and obtain both extrinsic and intrinsic parameters.
Our method can adapt to any camera numbers or positions.
Hi, thanks for the impressive work. I encountered with about 2% accuracy drop compared with the non-fusion version. As I verified the codes on H36M dataset, the model performed well. I thought it might be the incorrect camera intrinsics & extrinsics. In my experiment, I used the open source "Colmap" project to acquire these parameters. I wondered if there is any other ways. Thanks a lot.
hello, I have several cameras. the number and position are quite different from human3.6m. How should I get the prediction by using multi-camera data?