sign-language-processing / spoken-to-signed-translation

a text-to-gloss-to-pose-to-video pipeline for spoken to signed language translation
https://sign.mt/?sil=sgg&spl=de
MIT License
47 stars 12 forks source link

German wordnet #37

Open Tortoise17 opened 4 months ago

Tortoise17 commented 4 months ago

Is German gsg datasource for wordnet already available?

AmitMY commented 4 months ago

The resource is available https://github.com/sign-language-processing/datasets/tree/master/sign_language_datasets/datasets/sign_wordnet

and one just needs to make download_lexicon.py support this target

Tortoise17 commented 4 months ago

comparison wise gsg dataset csv file looks very small compared to ssg, while both are German. Is there also results difference in accuracy? If you already have any experience to compare?

Tortoise17 commented 4 months ago

@AmitMY thank you. This means, that need to make a complete new pipeline for sign language dataset library like signsuisse created by you?

AmitMY commented 4 months ago

Both are german, but one is swiss german sign language, and the other is german sign language.

No need to make a "complete new pipeline", just need to update https://github.com/sign-language-processing/spoken-to-signed-translation/blob/main/spoken_to_signed/download_lexicon.py with loading the sign_wordnet

Tortoise17 commented 4 months ago

I tried to replace some lines and use the script to download. below is error.

0it [00:00, ?it/s]2024-06-28 16:13:34.737127: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-28 16:13:34.774483: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-28 16:13:34.774534: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-28 16:13:34.774563: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-28 16:13:34.781269: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-28 16:13:35.485184: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:absl:Using custom data configuration 2024-06-28
2024-06-28 16:13:36.819648: W tensorflow/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /home/onetimeuser/tensorflow_datasets/sign_wordnet/2024-06-28/1.0.0...
/work/anaconda3/envs/sign/lib/python3.10/site-packages/sign_language_datasets-0.2.1-py3.10.egg/sign_language_datasets/datasets/warning.py:5: UserWarning: This library provides access to data sets without claiming ownership over them or defining their licensing terms. Users who download data are responsible for checking the license of each individual data set.
  warnings.warn(
[nltk_data] Downloading package wordnet to
[nltk_data]     /home/onetimeuser/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /home/onetimeuser/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package extended_omw to
[nltk_data]     /home/onetimeuser/nltk_data...
[nltk_data]   Package extended_omw is already up-to-date!
Dl Size...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3148467/3148467 [00:00<00:00, 751943271.38 MiB/s]
Dl Completed...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 680.53 url/s]
Dl Size...: 100%|█████████████████████████████████████████████Translation Languages ['eng']█████████████████████████████████████████████████████████████████████████████████| 3148467/3148467 [00:00<00:00, 835746328.21 MiB/s]
0it [00:05, ?it/s]..:   0%|                                                                                                                                                                         | 0/1 [00:00<?, ? splits/s]
Traceback (most recent call last):                                                                                                                                                                                             
  File "/work/justcheckresult/sign_language/spoken-to-signed-translation-main/spoken-to-signed-translation-main/spoken_to_signed/download_lexicon.py", line 128, in <module>
    main()
  File "/work/justcheckresult/sign_language/spoken-to-signed-translation-main/spoken-to-signed-translation-main/spoken_to_signed/download_lexicon.py", line 124, in main
    add_data(data, args.directory)
  File "/work/justcheckresult/sign_language/spoken-to-signed-translation-main/spoken-to-signed-translation-main/spoken_to_signed/download_lexicon.py", line 110, in add_data
    for row in tqdm(data):
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/work/justcheckresult/sign_language/spoken-to-signed-translation-main/spoken-to-signed-translation-main/spoken_to_signed/download_lexicon.py", line 47, in load_signsuisse
    dataset = tfds.load(name='sign_wordnet', builder_kwargs={"config": config})
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 169, in __call__
    return function(*args, **kwargs)
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/load.py", line 647, in load
    _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/load.py", line 506, in _download_and_prepare_builder
    dbuilder.download_and_prepare(**download_and_prepare_kwargs)
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 169, in __call__
    return function(*args, **kwargs)
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 699, in download_and_prepare
    self._download_and_prepare(
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1669, in _download_and_prepare
    split_infos = self._generate_splits(dl_manager, download_config)
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1644, in _generate_splits
    future = split_builder.submit_split_generation(
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/split_builder.py", line 331, in submit_split_generation
    return self._build_from_generator(**build_kwargs)
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/split_builder.py", line 391, in _build_from_generator
    for key, example in utils.tqdm(
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/sign_language_datasets-0.2.1-py3.10.egg/sign_language_datasets/datasets/sign_wordnet/sign_wordnet.py", line 145, in _generate_examples
    "sign_language": IANA_MAP[row["language"]],
KeyError: 'dsgs'

If you can guide what could be reason. it downloaded, but only English for some reason. I could not fix for the de although i tried to change

AmitMY commented 4 months ago

It seems like they added some new languages. We need to add them to this dictionary here. https://github.com/sign-language-processing/datasets/blob/master/sign_language_datasets/datasets/sign_wordnet/sign_wordnet.py#L26

Tortoise17 commented 4 months ago

@AmitMY Thank you. I changed it. "dgs": "gsg", is the language ??isnt it? and remaining I did somewhere, but when it downloads, it download only English wordnet. it looks like the nltk has no access to german sign_wordnet

AmitMY commented 4 months ago

No DSGS is Swiss German sign language which is code SGG. Unfortunately, can't really check for you why it only downloads the English Word net, as I broke both my hands and I'm now only dictating to you. You can try to debug the data set in the data set repository

Tortoise17 commented 4 months ago

@AmitMY Oh GET WELL SOON and recover fast and well. Best wishes for you. I will try to fix it.