sign-language-processing / datasets

TFDS data loaders for sign language datasets.
https://sign-language-processing.github.io/#existing-datasets
83 stars 27 forks source link

dataset(asl_lex): add new dataset #33

Closed AmitMY closed 1 year ago

AmitMY commented 1 year ago

Initial dataset only includes the rows for metadata, no videos since they are not available to download trivially fixes #31

bricksdont commented 1 year ago

Doing this:

 pip install git+https://github.com/sign-language-processing/datasets.git@asl-lex 
config = SignDatasetConfig(name="only-annotations", version="3.0.0", include_video=False)
asl_lex = tfds.load(name='asl_lex', builder_kwargs=dict(config=config))

I run into an error:

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.9/dist-packages/sign_language_datasets/datasets/asl_lex/data-key.csv'

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)

[/usr/local/lib/python3.9/dist-packages/tensorflow_datasets/core/utils/py_utils.py](https://localhost:8080/#) in reraise(e, prefix, suffix)
    382     else:
    383       exception = RuntimeError(f'{type(e).__name__}: {msg}')
--> 384     raise exception from e
    385   # Otherwise, modify the exception in-place
    386   elif len(e.args) <= 1:

FileNotFoundError: Failed to construct dataset asl_lex: [Errno 2] No such file or directory: '/usr/local/lib/python3.9/dist-packages/sign_language_datasets/datasets/asl_lex/data-key.csv'

looks like it has to do with wich additional files (= package data) are distributed in setup.py, but I did not find the issue yet

bricksdont commented 1 year ago

... the problem is that https://github.com/sign-language-processing/datasets/blob/asl-lex/MANIFEST.in does not cover CSV files

AmitMY commented 1 year ago

Nice catch! Added that

bricksdont commented 1 year ago

Now a recent change to pose-format leads to this:

---------------------------------------------------------------------------

ModuleNotFoundError                       Traceback (most recent call last)

[<ipython-input-2-842dd6697810>](https://localhost:8080/#) in <module>
      1 import tensorflow_datasets as tfds
----> 2 import sign_language_datasets.datasets
      3 from sign_language_datasets.datasets.config import SignDatasetConfig
      4 
      5 import itertools

6 frames

[/usr/local/lib/python3.9/dist-packages/pose_format/pose.py](https://localhost:8080/#) in <module>
      5 import numpy.ma as ma
      6 
----> 7 from pose_format.numpy import NumPyPoseBody
      8 from pose_format.pose_body import PoseBody
      9 from pose_format.pose_header import PoseHeader, PoseHeaderDimensions, PoseNormalizationInfo, PoseHeaderComponent

ModuleNotFoundError: No module named 'pose_format.numpy'

the numpy sub-folder is now in src/python (https://github.com/sign-language-processing/pose/tree/master/src/python/pose_format), and the import needs to change to:

from pose_format.python.numpy import NumPyPoseBody
AmitMY commented 1 year ago

shouldn't be the case. I'll check what's up

AmitMY commented 1 year ago

I think you might have a local installation of pose_format. Here is a reproduction: https://colab.research.google.com/drive/1BSPMMUaJre5U3XO2FKN5BzwaQe--LFY9?usp=sharing It doesn't happen if installed from pip

bricksdont commented 1 year ago

ah yes, but it happens if I do this:

import sign_language_datasets.datasets

am I not supposed to do this anymore? I remember this was necessary for TFDS to recognize local datasets

bricksdont commented 1 year ago

@AmitMY could you bump the version, make a new release on Pypi (I don't remember if we have automation for this, and what kind) and also mention asl-lex in the README.md?