Add a section on back-translation best practice, wrt to evaluation

sign-language-processing / sign-language-processing.github.io

Documentation and background of sign language processing

99 stars 9 forks source link

Add a section on back-translation best practice, wrt to evaluation #83

Open cleong110 opened 1 week ago

cleong110 commented 1 week ago

          possible write what `ham2pose` suggests, here.

maybe also need a note about back translation: that people use it (progressive transformers, sign llm), but the outputs are incoherent. this is because people train the backtranslation models on the translation model outputs, and not independently, as one should.

_Originally posted by @AmitMY in https://github.com/sign-language-processing/sign-language-processing.github.io/pull/77#discussion_r1641255726_

cleong110 commented 1 week ago

Possibly relevant here:

On The Evaluation of Machine Translation SystemsTrained With Back-Translation https://www.semanticscholar.org/paper/On-The-Evaluation-of-Machine-Translation-With-Edunov-Ott/8b5b8bd0942d5f39d01c1f4b89d174fb3fde99cc
Quality Estimation via Backtranslation at the WMT 2022 Quality Estimation Task https://aclanthology.org/2022.wmt-1.54.pdf could work perhaps
https://www.semanticscholar.org/paper/Back-translation-practices-in-organizational-loss-Klotz-Swider/239bd4331238aefce113096554707aa4e853f272

cleong110 commented 1 week ago

@inproceedings{huangFastHighQualitySign2021,
  title = {Towards {{Fast}} and {{High-Quality Sign Language Production}}},
  booktitle = {Proceedings of the 29th {{ACM International Conference}} on {{Multimedia}}},
  author = {Huang, Wencan and Pan, Wenwen and Zhao, Zhou and Tian, Qi},
  year = {2021},
  month = oct,
  series = {{{MM}} '21},
  pages = {3172--3181},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  doi = {10.1145/3474085.3475463},
  url = {https://doi.org/10.1145/3474085.3475463},
  urldate = {2024-06-19},
  isbn = {978-1-4503-8651-7}
}

has this to say:

cleong110 commented 1 week ago

Which seems to suggest that the method is:

Generate poses
run the code from "Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation" on it
See the BLEU score, etc.

With the theory being that as you generate better, the SLT should get more accurate.

They don't train a backtranslation model themselves, that's independent