sign-language-processing / spoken-to-signed-translation

a text-to-gloss-to-pose-to-video pipeline for spoken to signed language translation
https://sign.mt/?sil=sgg&spl=de
MIT License
47 stars 12 forks source link

New another language to add or change. #28

Open GLWmax opened 8 months ago

GLWmax commented 8 months ago

Error: download_lexicon.py --name 'vgt' --directory ./vgt
usage: download_lexicon.py [-h] --name {signsuisse} --directory DIRECTORY download_lexicon.py: error: argument --name: invalid choice: 'vgt' (choose from 'signsuisse')

I have add 'VGT'


from .types import Gloss from .common import load_spacy_model

LANGUAGE_MODELS_SPACY = { "de": "de_core_news_lg", "fr": "fr_core_news_lg", "vgt": "vgt_core_news_lg", "en": "en_core_web_lg", }

def text_to_gloss(text: str, language: str, ignore_punctuation: bool = False) -> Gloss:

if language not in LANGUAGE_MODELS_SPACY:
    raise NotImplementedError("Don't know language '%s'." % language)

model_name = LANGUAGE_MODELS_SPACY[language]

# disable unnecessary components to make lemmatization faster

spacy_model = load_spacy_model(model_name, disable=("parser", "ner"))

doc = spacy_model(text)

glosses = []  # type: Gloss

for token in doc:
    if ignore_punctuation is True:
        if token.is_punct:
            continue

    gloss = (token.text, token.lemma_)
    glosses.append(gloss)

return glosses

AmitMY commented 8 months ago

I see that you are trying to use this repository with VGT.

The download_lexicon script does not support any VGT dataset, so to support Flemish, you would have to go through the following process:

  1. Collect a lexicon (Download videos from https://vlaamsegebarentaal.be/signbank/signs/show_all/ or collect your own)
  2. Extract poses using this library and the command video_to_pose --format mediapipe -i example.mp4 -o example.pose
  3. Construct a lexicon CSV file with the words, matching the poses, for example https://github.com/sign-language-processing/spoken-to-signed-translation/blob/main/assets/dummy_lexicon/index.csv
    path,spoken_language,signed_language,start,end,words,glosses,priority
    sgg/kleine.pose,de,sgg,0,0,kleine,Kleine,0
    sgg/kinder.pose,de,sgg,0,0,kinder,Kinder,0

Now, once you have this index.csv, under a directory called, let's say, lexicon, you can run for example:

text_to_gloss_to_pose \
  --text "Hallo mijn naam is john." \
  --glosser "simple" \
  --lexicon "lexicon" \
  --spoken-language "nl" \
  --signed-language "vgt" \
  --pose "quick_test.pose"
KhayitboevElbekjon commented 7 months ago

hello,I have one problem, look at.

_download_lexicon \ --name \ --directory _

What should I put in "name" and "directory" in this code?

KhayitboevElbekjon commented 7 months ago

which file should i run to use this program?

AmitMY commented 7 months ago

hello,I have one problem, look at.

_download_lexicon --name --directory _

What should I put in "name" and "directory" in this code?

the only dataset available in this repository is signsuisse. If you have a further issue that is not related to the issue at hand, please create a different issue.

cleong110 commented 5 months ago

https://www.corpusvgt.be/ might work

cleong110 commented 5 months ago

Or https://woordenboek.vlaamsegebarentaal.be/search, used by https://github.com/m-decoster/VGT-SL-Dictionary

KhayitboevElbekjon commented 5 months ago

Thanks for the tip!

On Wed, Jun 12, 2024 at 2:31 AM cleong110 @.***> wrote:

Or https://woordenboek.vlaamsegebarentaal.be/search, used by https://github.com/m-decoster/VGT-SL-Dictionary

— Reply to this email directly, view it on GitHub https://github.com/sign-language-processing/spoken-to-signed-translation/issues/28#issuecomment-2161630553, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5KO7YAXRRSNKBSTGE732RTZG5UCTAVCNFSM6AAAAABE6JYQ5KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGYZTANJVGM . You are receiving this because you commented.Message ID: <sign-language-processing/spoken-to-signed-translation/issues/28/2161630553 @github.com>