matyasbohacek / spoter

Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper
https://spoter.signlanguagerecognition.com
Apache License 2.0
73 stars 24 forks source link

training new data #7

Open herochen7372 opened 2 years ago

herochen7372 commented 2 years ago

Hi, I like your work. I want to make new data, but I don't understand the structure of your skeletal data. Are you willing to teach me ? sendpix7

RodGal-2020 commented 2 years ago

I believe that those are the Y axis positions of the middlePIP_left keypoint through the whole video (take a look at this to see the codification of hands' keypoints). Take into account that some other variables, like labels and height, appear as a independent variables, not inside lists.

However, I don't understand the order of the variables in the header section.

herochen7372 commented 2 years ago

I believe that those are the Y axis positions of the middlePIP_left keypoint through the whole video (take a look at this to see the codification of hands' keypoints). Take into account that some other variables, like labels and height, appear as a independent variables, not inside lists.

However, I don't understand the order of the variables in the header section.

I think so,but this should be the feature extracted from coordinates.now I don't understand why their dimensions are different.

herochen7372 commented 2 years ago

How to change from (x,y, score) to image

RodGal-2020 commented 2 years ago

Let's make a minimal example:

For each frame $i$, with $i \in [ 1, n_{frames} ]$, using Vision Api or MediaPipe you get your

$$(x, y, confidence)_{i,j}$$

estimations for, each pair let's say, $j = left.wrist, right.wrist$, so really you have, for each wrist, a series of coordinates and confidences, as long as $n_{frames}$, which can be reorganized as:

$$ left.wristx = [ x \ : \ \exists (x, y, confidence){i, left.wrist}, \ i \in [1, n_{frames}] ] $$

$$ left.wristy = [ y \ : \ \exists (x, y, confidence){i, left.wrist}, \ i \in [1, n_{frames}] ] $$

$$ right.wristx = [ x \ : \ \exists (x, y, confidence){i, right.wrist}, \ i \in [1, n_{frames}] ] $$

$$ right.wristy = [ y \ : \ \exists (x, y, confidence){i, right.wrist}, \ i \in [1, n_{frames}] ] $$

As you can see, each of these objects have a length of $n_{frames}$, which I believe explains the case you showed.

In your particular example, however, the list of zeros is the codification of an uncorrect estimation, i.e., if your confidence is lower than a given threshold, the Pose Estimation Methods will return $x=y=0$.

Hope this helps!

herochen7372 commented 2 years ago

Thank you,that‘s great explanation.

RodGal-2020 commented 2 years ago

You're welcome, thanks!

matyasbohacek commented 1 year ago

Thanks both — especially @RodGal-2020 for the explanation for others members in the thread during my absence.

We will be providing more options of pose estimator formats like OpenPose, MMPose, and MediaPipe — including example code — very soon. This will include code for conversion from standardized formats. Please stay tuned; I will update you here.

For now, you can convert the data yourself (as suggested above) or obtain it (directly in the supported format) using Vision API, which was used in our original work (WACV'22 paper). To do so, you can use our Pose Data Annotator app (Mac App Store, GitHub with Swift code).

NaNtaisuike commented 2 months ago

感谢你们——特别是@RodGal-2020在我缺席期间,向该主题中的其他成员提供解释。

我们将很快提供更多姿势估计器格式选项,如 OpenPose、MMPose 和 MediaPipe,包括示例代码。这将包括从标准格式转换的代码。请继续关注;我会在这里向您更新。

目前,您可以自行转换数据(如上所述)或使用 Vision API 获取数据(直接以支持的格式),该 API 在我们最初的工作(WACV'22 论文)中使用过。为此,您可以使用我们的 Pose Data Annotator 应用程序(Mac App Store带有 Swift 代码的 GitHub)。

请问一下应用程序是mac版本的吗,由windows可以使用的版本吗

NaNtaisuike commented 2 months ago

Thanks both — especially @RodGal-2020 for the explanation for others members in the thread during my absence.

We will be providing more options of pose estimator formats like OpenPose, MMPose, and MediaPipe — including example code — very soon. This will include code for conversion from standardized formats. Please stay tuned; I will update you here.

For now, you can convert the data yourself (as suggested above) or obtain it (directly in the supported format) using Vision API, which was used in our original work (WACV'22 paper). To do so, you can use our Pose Data Annotator app (Mac App Store, GitHub with Swift code). Has the conversion code for the current release of OpenPose been published?