sign / translate

Effortless Real-Time Sign Language Translation
https://sign.mt
Other
414 stars 74 forks source link

Inquiries and Enhancements for Spoken to Signed Pipeline, HumanGAN, and Vietnamese Language Support #151

Closed nqkhanh2002 closed 2 months ago

nqkhanh2002 commented 3 months ago

Description

Hi team, I am working as a Computer Vision Engineer, GenAI Engineer and Data Scientist. I was looking for a similar project and I found this to be a great repo for me to start with. I have some questions which I have listed below

  1. First is the chart of the Spoken to Signed section in the wiki. I understand that the red line has not been implemented yet so I don't know what method SignWriting -> Pose Sequence is using (Picture 1)
  2. Second, I want to add the Vietnamese language to the Spoken Language Text → SignWriting SignWriting → Pose Sequence stage on the SignBank+ data set, so after adding the Vietnamese data set, I can access the source code of the training data. practice model for this part?
  3. In addition to the above issue, I see ongoing work and need for improvement in the HumanGAN model part, and thank AmitMY for publishing and detailing everything about this part. So I think after being able to go through the whole pipeline I can continue improving Pose Sequence for Human GAN

Any additional information that needs to be added please let me know. And there will be more flexibility when the team has a chat communication channel, thank you very much

Screenshots

Picture 1 image

Environment

AmitMY commented 3 months ago

Happy to help. Please note the repository's license.

  1. Since we don't currently have a model for SignWriting to Pose (https://github.com/rotem-shalev/Ham2Pose is the closest project we did), at the moment, the interface uses the left, dashed pathway - text -> glosses -> pose sequence which is a very simple implementation that can be found in https://github.com/sign-language-processing/spoken-to-signed-translation
  2. If you want to use the dictionary from SignPuddle for translating to SignWriting - that is already in the SignBank+ dataset. If you have your own data with SignWriting, great, please extend the dataset here https://github.com/sign-language-processing/signbank-plus - But if you only have dictionary data with videos and words, please use https://github.com/sign-language-processing/spoken-to-signed-translation which does not require SignWriting. Ideally, when we release new models (training code here) then Vietnamese should be better supported given that we cleaned the data, but not so much right now.
  3. We have very bad and very fast GANs, but also quite good and very slow diffusion models (https://github.com/sign-language-processing/pose-to-video) - feel free to use/improve any of them :) If you were to have a better GAN that is as fast but better (for example, Pix2Pix w/ StyleGAN3 architecture), I'm happy to integrate it directly in the app.

Let me know if you have further questions