perfanalytics / pose2sim

Markerless kinematics with any cameras — From 2D Pose estimation to 3D OpenSim motion
BSD 3-Clause "New" or "Revised" License
238 stars 44 forks source link

Can I import output of Sports2D into pose2sim and then into OpenSim #71

Closed gabriead closed 6 months ago

gabriead commented 6 months ago

Hey guys, I am pretty new to this domain so I need some help to get going. Can I use the output of Sports2D input that into pose2sim and then get an OpenSim model? What output data fomr Sports2D would I have to use (and in what format) and how would the workflow look like to use this in pose2sim? Thank's a lot in advance!

davidpagnon commented 6 months ago

Hi, why do you want to use the output of Sports2D?

In any case, Sports2D outputs json files which can be used in Pose2Sim, but you also need several cameras and a calibration file. The CSV files for positions can be turned to trc files. This is easy (open the CSV file and a tec file by file, just the header changes) but I haven't done it.

I hope it answered your question!

gabriead commented 6 months ago

Hi @davidpagnon thank's for your response! Exactly I only have one camera therefore it will be third use case you describe. Is there a model/workflow out there to transform Sports2D 2D data into 3D data to get an OpenSim 3D model? The authros of this paper (https://www.frontiersin.org/articles/10.3389/fspor.2022.994221/full) seem to have done something similar. If not I would be happy to start with an Open Sim 2D model. What would I have to do to implement that? Can we collaborate on that, would love to make such a solution available in your package for everyone then!

davidpagnon commented 6 months ago

I did not know about this paper, thanks!

The main difference is that they used a Kinect, which has a depth sensor that can retrieve 3D. Some alternative approaches can lift 3D from a single video (BlazePose being the easiest to use). I've been thinking of implementing it in Pose2Sim, but I could not find time for it. Also, I highly suspect it would not be very accurate.

But I don't know, it may be worth trying! It would still be more accurate than fitting an OpenSim model just on 2D points from Sports2D (although Sports2D supports multi-person detection, unlike BlazePose that you would use otherwise). In both cases, you could have good angles and timing, but positions will be in pixels. To get them in meters, you would need to calibrate the camera.

The workflow would go this way:

You could scale the pixel coordinates to obtain meter coordinates, by filming a static sequence. Not that this sequence needs to be taken in the frontal or sagittal plane, or distances will be skewed.

All of it is pretty doable, but unfortunately, I am not sure when or if I'll find time for it. If you feel like doing it, feel absolutely free though, I'll be happy to review and accept pull requests! Also, just in case, here is the discord server where other users participate in the development of Pose2Sim, it is easier than discussing via Github issues.

gabriead commented 6 months ago

Thank's for the invitation, happy to discuss the topic further there!

davidpagnon commented 6 months ago

The other approach would use this model (2D_model.zip), with the OpenPose MarkerSet (see there). This is actually a 3D model, but it has limited degrees of freedom (only in the sagittal plane).

With this approach you could convert the json or csv coordinates to a trc file, with depth being set to zero. Scaling would be done only in X and Y direction, not Z. And IK would be mostly unchanged.

You could scale the pixel coordinates to obtain meter coordinates, by letting the user select two points on the image and specifying their relative distance. This would only work if the person stays in the frontal or sagittal plane though. Just imagine that you face the camera, and that your shoulder width is 100 px in the image, and 50 cm in reality. You could infer that 1 m <-> 200 px. However, if you move sideways, your shoulder width would be like 10 px on the image, and still 50 cm in reality. Same remark if you get closer or further from the camera.

davidpagnon commented 6 months ago

Sorry I still haven't read the article in depth but I just see that they did not use the depth sensor of their Kinect. So they used an approach similar to the BlazePose one to estimate depth.

Note that they set themselves up in a pretty simplified situation, where the person stays in the frontal plane, and does not move forwards, backwards, nor to the side. This removes most of the constraints I was telling you about.

gabriead commented 6 months ago

No worries! I have contacted one of the authors to get more information on the exact details of their approach. The thing I am still struggling the most currently is to understand how I transform the data into a format that can be used in OpenSim. So currently with the RealSense I can obtain the following data (small example) : [ left_ellbow[x_coord, y_coord, z_coord], right_ellbow[.....,]]. Is that data sufficient to use your transformation scripts to get a first OpenSim model? As soon as I understand how to map the RealSense data into OpenSim I can start to understand more details and follow you up on the constraints.

davidpagnon commented 6 months ago

You will need to convert it yourself, but it should not be too complicated: just have a look at the trc file resulting from the Demo, that should be a bit clearer.

davidpagnon commented 6 months ago

This is issue has been closed as stale. Feel free to reopen it if needed!