sign-language-processing / sign-language-processing.github.io

Documentation and background of sign language processing
99 stars 9 forks source link

Check/disambiguate CSL-Daily citations #72

Closed cleong110 closed 4 days ago

cleong110 commented 2 weeks ago

Something else I discovered: a number of citations go to

@dataset:huang2018video

which _is_apparently a CSL dataset, but not the CSL-Daily dataset

Originally posted by @cleong110 in https://github.com/sign-language-processing/sign-language-processing.github.io/issues/56#issuecomment-2159263480

cleong110 commented 2 weeks ago

Correct Citation for CSL-Daily should be https://ieeexplore.ieee.org/document/9578398

cleong110 commented 4 days ago

Uses of huang2018video:

@adaloglou2020comprehensive perform a comparative experimental assessment of computer vision-based methods for the video-to-gloss task.
They implement various approaches from previous research [@camgoz2017subunets;@cui2019deep;@dataset:joze2018ms]
and test them on multiple datasets [@dataset:huang2018video;@cihan2018neural;@dataset:von2007towards;@dataset:joze2018ms]
either for isolated sign recognition or continuous sign recognition.
They conclude that 3D convolutional models outperform models using only recurrent networks to capture the temporal information,
and that these models are more scalable given the restricted receptive field, which results from the CNN "sliding window" technique.
@xiao2020skeleton closed the loop by proposing a text-to-pose-to-text model for the case of isolated sign language recognition.
They first trained a classifier to take a sequence of poses encoded by a BiLSTM and classify the relevant sign, then proposed a production system to take a single sign and sample a constant length sequence of 50 poses from a Gaussian Mixture Model.
These components are combined such that given a sign class $y$, a pose sequence is generated, then classified back into a sign class $ŷ$,
and the loss is applied between $y$ and $ŷ$, and not directly on the generated pose sequence.
They evaluate their approach on the CSL dataset [@dataset:huang2018video] and show that their generated pose sequences 
almost reach the same classification performance as the reference sequences.
###### Isolated sign corpora {-}
are collections of annotated single signs. They are synthesized [@dataset:ebling2018smile;@dataset:huang2018video;@dataset:sincan2020autsl;@dataset:hassan-etal-2020-isolated] or mined from online resources [@dataset:joze2018ms;@dataset:li2020word], and can be used for isolated sign language recognition or contrastive analysis of minimal signing pairs [@dataset:imashev2020dataset]. However, like dictionaries, they do not describe relations between signs, nor do they capture coarticulation during the signing, and are often limited in vocabulary size (20-1000 signs).
cleong110 commented 4 days ago

huang2018video link: https://aaai.org/papers/11903-video-based-sign-language-recognition-without-temporal-segmentation/ image image

The CSL dataset in Tab. 1 is collected by us and released on our project web page

And the link is dead

image

REGARDLESS, this is not the same as CSL-Daily

cleong110 commented 4 days ago

Checked adaloglou2020comprehensive and xiao2020skeleton and they both are actually citing huang2018video correctly, so I think we're good now.