sign-language-processing / datasets

TFDS data loaders for sign language datasets.
https://sign-language-processing.github.io/#existing-datasets
83 stars 27 forks source link

RecursionError when downloading datasets with python 3.12: set requirements accordingly? #67

Open cleong110 opened 8 months ago

cleong110 commented 8 months ago

image

Steps to reproduce on my own machine:

conda create -n sign_language_datasets pip 
conda activate sign_language_datasets 
python --version # 3.12 by default
python -m pip install sign-language-datasets webvtt-py

# create a download_dgs_corpus.py file with the following contents
import tensorflow_datasets as tfds
import sign_language_datasets.datasets
from sign_language_datasets.datasets.config import SignDatasetConfig

import itertools
import sys
print(sys.getrecursionlimit())
# sys.setrecursionlimit(50)
# default settings includes both pose and video
dgs_corpus = tfds.load('dgs_corpus')

# run it
python download_dgs_corpus.py 

It works in colab (Python 3.10), but not on my machine in an env with python 3.12. When I create a conda env with 3.10 it works without issue.

When I create an env with 3.11, I get "no module named lxml" but that's a different issue edit: I was installing in my base environment, never mind this part

https://github.com/tensorflow/datasets/issues/4666 upstream issue, apparently.

cleong110 commented 8 months ago

OK, installed lxml and now I'm getting "Failed to get url https://nlp.biu.ac.il/~amit/datasets/dgs.json. HTTP code: 404.", which seems new but unrelated to this never mind, python 3.11 seems to work fine, I was installing in my conda base env

cleong110 commented 8 months ago

So it really does seem that Python 3.12 is the issue, as noted in https://github.com/tensorflow/datasets/issues/4666.

cleong110 commented 8 months ago

Never mind the nevermind, if you have python 3.11 you need to manually install lxml or dgs corpus downloading crashes when using default config. But that's a DGS-corpus-specific issue I suppose, so never mind the neverminding of the nevermind maybe? image

abir-g commented 6 months ago

Thanks for this.

cleong110 commented 5 months ago

According to https://github.com/tensorflow/datasets/issues/4666#issuecomment-2149200103, this is now fixed in the latest version of tfds.

If we can confirm that, we can close this issue.

cleong110 commented 5 months ago

Gave it a go. New conda env, python 3.12, pip install sign_language_datasets. Ended up with tfds-nightly-4.9.5.dev202406050044, not 4.9.6, the version of tfds which supposedly solves this.

cleong110 commented 5 months ago

Did some shenanigans - uninstalled tfds-nightly, and then pip install tensorflow-datasets, and then it couldn't import it, so pip install -U --force-reinstall tensorflow-datasets and then now it seems to work.