Open RongLirr opened 1 week ago
Thank you for making this a warning. What I suspect - one of the videos is at 50fps and another is at 25fps so they have a varying number of frames. This should be checked in the future, for now we have a warning :)
This happened for me as well, same Document number, 1177918.
It then tried to remove the incomplete dir and failed, with OSError: [Errno 39] Directory not empty
The OSError is then not caught, and the entire load operation fails as a result
@AmitMY Here's a stacktrace. The warning triggers an OSError, and then the OSError crashes the whole thing.
Traceback (most recent call last):
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/utils/file_utils.py", line 125, in incomplete_dir
yield tmp_dir
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 756, in download_and_prepare
self._download_and_prepare(
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1752, in _download_and_prepare
split_infos = self._generate_splits(dl_manager, download_config)
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1727, in _generate_splits
future = split_builder.submit_split_generation(
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/split_builder.py", line 436, in submit_split_generation
return self._build_from_generator(**build_kwargs)
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/split_builder.py", line 496, in _build_from_generator
for key, example in utils.tqdm(
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
for obj in iterable:
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/sign_language_datasets/datasets/dgs_corpus/dgs_corpus.py", line 388, in _generate_examples
assert all(
AssertionError: Document 1177918: The poses are not synchronized ([(28254, 1, 543, 3), (14127, 1, 543, 3)])
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/home/cleong/sldata_download.py", line 22, in <module>
dataset, info = tfds.load(
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 176, in __call__
return function(*args, **kwargs)
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/load.py", line 661, in load
_download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/load.py", line 517, in _download_and_prepare_builder
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/logging/__init__.py", line 176, in __call__
return function(*args, **kwargs)
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/dataset_builder.py", line 737, in download_and_prepare
with utils.incomplete_dir(
File "/opt/home/cleong/envs/sldata/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/tensorflow_datasets/core/utils/file_utils.py", line 131, in incomplete_dir
tmp_path.rmtree()
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/etils/epath/gpath.py", line 220, in rmtree
self._backend.rmtree(self._path_str)
File "/opt/home/cleong/envs/sldata/lib/python3.10/site-packages/etils/epath/backend.py", line 193, in rmtree
shutil.rmtree(path)
File "/opt/home/cleong/envs/sldata/lib/python3.10/shutil.py", line 731, in rmtree
onerror(os.rmdir, path, sys.exc_info())
File "/opt/home/cleong/envs/sldata/lib/python3.10/shutil.py", line 729, in rmtree
os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/data/petabyte/cleong/data/tfds_sign_language_datasets/dgs_corpus/holistic/incomplete.9K8YL7_3.0.0'
@cleong110 can you share your exact command? Looks like maybe one of the poses is 25fps and the other is 50, so i wanna see how you run it
I am just calling tfds.load with "dgs_corpus/holistic" as the name. I'm trying to download some of the datasets locally, and used the following script, which lets me give it a name and it'll go.
Full script:
# https://github.com/sign-language-processing/datasets/blob/master/sign_language_datasets/datasets/autsl/autsl.py
# /opt/home/cleong/envs/sldata/lib/python3.10/site-packages/sign_language_datasets/datasets/autsl/autsl.py
import tensorflow_datasets as tfds
import sign_language_datasets.datasets
from sign_language_datasets.datasets.config import SignDatasetConfig
from pathlib import Path
import argparse
import itertools
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="attempt to download a dataset from sign-language-datasets, e.g. 'dgs_corpus/holistic'")
parser.add_argument("dataset_name", help="something like 'dgs_corpus'")
parser.add_argument("--data_dir", type=Path, default=Path("/data/petabyte/cleong/data/tfds_sign_language_datasets"))
args= parser.parse_args()
data_dir = "/data/petabyte/cleong/data/tfds_sign_language_datasets"
# config = SignDatasetConfig(name="only-annotations", version="1.0.0", include_video=False)
# config = SignDatasetConfig(name="poses-please", include_pose="holistic")
# autsl = tfds.load(name='autsl', data_dir=data_dir, builder_kwargs={"config": config})
# autsl = tfds.load(name='autsl/holistic', data_dir=data_dir)
dataset, info = tfds.load(
name=str(args.dataset_name),
# builder_kwargs={"config": config},
data_dir=args.data_dir,
with_info=True)
for datum in itertools.islice(dataset["train"], 0, 2):
print(f"datum")
print(datum)
print(info)
I called that script thus:
python sldata_download.py "dgs_corpus/holistic" 2>&1|tee dgs_corpus_fails.txt
When loading the dgs_corpus dataset, an AssertionError occurs due to unsynchronized pose shapes within one of the documents.
Here is the error message:
AssertionError: Document 1177918: The poses are not synchronized ([(28254, 1, 543, 3), (14127, 1, 543, 3)])
Assertion Code: datasets/sign_language_datasets/datasets/dgs_corpus/dgs_corpus.py
assert all(p.body.data.shape == first_pose.body.data.shape for p in poses_values), f"Document {document_id}: The poses are not synchronized ({[p.body.data.shape for p in poses_values]})"