Open Tortoise17 opened 4 months ago
The resource is available https://github.com/sign-language-processing/datasets/tree/master/sign_language_datasets/datasets/sign_wordnet
and one just needs to make download_lexicon.py
support this target
comparison wise gsg dataset csv file looks very small compared to ssg, while both are German. Is there also results difference in accuracy? If you already have any experience to compare?
@AmitMY thank you. This means, that need to make a complete new pipeline for sign language dataset library like signsuisse created by you?
Both are german, but one is swiss german sign language, and the other is german sign language.
No need to make a "complete new pipeline", just need to update https://github.com/sign-language-processing/spoken-to-signed-translation/blob/main/spoken_to_signed/download_lexicon.py with loading the sign_wordnet
I tried to replace some lines and use the script to download. below is error.
0it [00:00, ?it/s]2024-06-28 16:13:34.737127: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-28 16:13:34.774483: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-28 16:13:34.774534: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-28 16:13:34.774563: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-28 16:13:34.781269: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-28 16:13:35.485184: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:absl:Using custom data configuration 2024-06-28
2024-06-28 16:13:36.819648: W tensorflow/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /home/onetimeuser/tensorflow_datasets/sign_wordnet/2024-06-28/1.0.0...
/work/anaconda3/envs/sign/lib/python3.10/site-packages/sign_language_datasets-0.2.1-py3.10.egg/sign_language_datasets/datasets/warning.py:5: UserWarning: This library provides access to data sets without claiming ownership over them or defining their licensing terms. Users who download data are responsible for checking the license of each individual data set.
warnings.warn(
[nltk_data] Downloading package wordnet to
[nltk_data] /home/onetimeuser/nltk_data...
[nltk_data] Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data] /home/onetimeuser/nltk_data...
[nltk_data] Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package extended_omw to
[nltk_data] /home/onetimeuser/nltk_data...
[nltk_data] Package extended_omw is already up-to-date!
Dl Size...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3148467/3148467 [00:00<00:00, 751943271.38 MiB/s]
Dl Completed...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 680.53 url/s]
Dl Size...: 100%|█████████████████████████████████████████████Translation Languages ['eng']█████████████████████████████████████████████████████████████████████████████████| 3148467/3148467 [00:00<00:00, 835746328.21 MiB/s]
0it [00:05, ?it/s]..: 0%| | 0/1 [00:00<?, ? splits/s]
Traceback (most recent call last):
File "/work/justcheckresult/sign_language/spoken-to-signed-translation-main/spoken-to-signed-translation-main/spoken_to_signed/download_lexicon.py", line 128, in <module>
main()
File "/work/justcheckresult/sign_language/spoken-to-signed-translation-main/spoken-to-signed-translation-main/spoken_to_signed/download_lexicon.py", line 124, in main
add_data(data, args.directory)
File "/work/justcheckresult/sign_language/spoken-to-signed-translation-main/spoken-to-signed-translation-main/spoken_to_signed/download_lexicon.py", line 110, in add_data
for row in tqdm(data):
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
for obj in iterable:
File "/work/justcheckresult/sign_language/spoken-to-signed-translation-main/spoken-to-signed-translation-main/spoken_to_signed/download_lexicon.py", line 47, in load_signsuisse
dataset = tfds.load(name='sign_wordnet', builder_kwargs={"config": config})
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 169, in __call__
return function(*args, **kwargs)
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/load.py", line 647, in load
_download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/load.py", line 506, in _download_and_prepare_builder
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 169, in __call__
return function(*args, **kwargs)
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 699, in download_and_prepare
self._download_and_prepare(
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1669, in _download_and_prepare
split_infos = self._generate_splits(dl_manager, download_config)
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1644, in _generate_splits
future = split_builder.submit_split_generation(
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/split_builder.py", line 331, in submit_split_generation
return self._build_from_generator(**build_kwargs)
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tensorflow_datasets/core/split_builder.py", line 391, in _build_from_generator
for key, example in utils.tqdm(
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
for obj in iterable:
File "/work/anaconda3/envs/sign/lib/python3.10/site-packages/sign_language_datasets-0.2.1-py3.10.egg/sign_language_datasets/datasets/sign_wordnet/sign_wordnet.py", line 145, in _generate_examples
"sign_language": IANA_MAP[row["language"]],
KeyError: 'dsgs'
If you can guide what could be reason. it downloaded, but only English for some reason. I could not fix for the de although i tried to change
It seems like they added some new languages. We need to add them to this dictionary here. https://github.com/sign-language-processing/datasets/blob/master/sign_language_datasets/datasets/sign_wordnet/sign_wordnet.py#L26
@AmitMY Thank you. I changed it. "dgs": "gsg", is the language ??isnt it? and remaining I did somewhere, but when it downloads, it download only English wordnet. it looks like the nltk has no access to german sign_wordnet
No DSGS is Swiss German sign language which is code SGG. Unfortunately, can't really check for you why it only downloads the English Word net, as I broke both my hands and I'm now only dictating to you. You can try to debug the data set in the data set repository
@AmitMY Oh GET WELL SOON and recover fast and well. Best wishes for you. I will try to fix it.
Is German gsg datasource for wordnet already available?