sign-language-processing / sign-language-processing.github.io

Documentation and background of sign language processing
99 stars 9 forks source link

Add Video-Based CSL features and working URL #46

Closed cleong110 closed 1 month ago

cleong110 commented 1 month ago

Video-Based CSL:

  1. Link http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html gives a 404
  2. Progressively walking up the hierarchy of the URL to http://home.ustc.edu.cn/ shows it is a student FTP server. And http://home.ustc.edu.cn/~pjh/ is 404, possibly the student in question with initials "pjh" has graduated
  3. I believe this is the citation: https://arxiv.org/pdf/1801.10111, which lists the project website as http://mccipc.ustc.edu.cn/mediawiki/index.php/SLR_Dataset, also 404.
  4. Searching on Google lead me to https://github.com/Skye601/SLR, which lead me to http://home.ustc.edu.cn/~hagjie/, which seems to be the right link. Statistics match what's in the paper: image

While I'm at it I should note that the paper and website have the following details about the Features:

The CSL dataset in Tab. 1 is collected by us and released on our project web page4. A Microsoft Kinect camera is used for all recording, providing RGB, depth and body joints modalities in all videos. The additional modalities should provide helpful additional in- formation as proven in hyper-spectral imaging efforts (Ran et al. 2017a; Zhang et al. 2011; 2012; Abeida et al. 2013), which is potentially helpful in future works. In this paper, only the RGB modality is used. The CSL dataset contains 25K labeled video instances, with 100+ hours of total video footage by 50 signers. Every video instance is annotated with a complete sentence by a professional CSL teacher. 17K instances are selected for training, 2K for validation, and the rest 6K for testing. The RWTH-PHOENIX-Weather dataset contains 7K weather forecasts sentences from 9 sign- ers. All videos are of 25 frames per second (FPS) and at res- olution of 210 × 260. Following (Koller, Forster, and Ney 2015), 5,672 instances are used for training, 540 for valida- tion, and 629 for testing

So we ought to add video:RGBD, pose:Kinect I suppose? The website suggests that they basically just took the output of Kinect and used it.

This is big enough to be its own issue/pull request I think.

Originally posted by @cleong110 in https://github.com/sign-language-processing/sign-language-processing.github.io/issues/45#issuecomment-2129573181

cleong110 commented 1 month ago

Things I'm unsure about:

cleong110 commented 1 month ago

Oh hey, the link on the project site doesn't seem to work either:

image

image

... should we just remove the whole dataset?

cleong110 commented 1 month ago

Also the paper says 25k videos, but the JSON currently says 125k. Adjusting to match.

cleong110 commented 1 month ago

Well, I'll do a PR anyway and we can discuss