harvard-edge / dataperf-speech-example

Example workflow for our data-centric speech benchmark
17 stars 11 forks source link

setup instructions don't work for me: "mlcube is not a package" error #16

Open jwmueller opened 1 year ago

jwmueller commented 1 year ago

When running:

mlcube run --task=download -Pdocker.build_strategy=always

I get error:

Traceback (most recent call last):
  File "/Users/jonas/virtual/dataperf/bin/mlcube", line 5, in <module>
    from mlcube.__main__ import cli
ModuleNotFoundError: No module named 'mlcube.__main__'; 'mlcube' is not a package

Am using Python 3.9.15 and here is output of pip freeze:

antlr4-python3-runtime==4.8
appdirs==1.4.4
arrow==1.2.3
attrs==22.1.0
beautifulsoup4==4.11.1
binaryornot==0.4.4
black==19.10b0
certifi==2022.9.24
chardet==5.1.0
charset-normalizer==2.1.1
cleanlab==2.0.0
click==7.1.2
colorama==0.4.6
coloredlogs==14.0
cookiecutter==1.7.2
coverage==6.5.0
docker==4.3.1
filelock==3.8.0
fire==0.4.0
flake8==3.9.2
gdown==4.5.4
halo==0.0.30
humanfriendly==10.0
idna==3.4
Jinja2==2.11.3
jinja2-time==0.2.0
joblib==1.2.0
log-symbols==0.0.14
MarkupSafe==1.1.1
mccabe==0.6.1
mlcube==0.0.8
mlcube-docker==0.0.8
more-itertools==9.0.0
numpy==1.23.5
omegaconf==2.1.0
packaging==21.3
pandas==1.5.2
pathspec==0.10.2
pluggy==0.13.1
poyo==0.5.0
protobuf==4.21.10
py==1.11.0
pyarrow==10.0.1
pycodestyle==2.7.0
pyflakes==2.3.1
pyparsing==3.0.9
PySocks==1.7.1
pytest==5.4.3
pytest-cov==2.12.1
pytest-mock==1.13.0
python-dateutil==2.8.2
python-slugify==7.0.0
pytz==2022.6
PyYAML==5.4.1
regex==2022.10.31
requests==2.28.1
scikit-learn==1.1.3
scikit-learn-extra==0.2.0
scipy==1.9.3
six==1.16.0
soupsieve==2.3.2.post1
spinners==0.0.24
termcolor==2.1.1
text-unidecode==1.3
threadpoolctl==3.1.0
toml==0.10.2
tqdm==4.64.1
typed-ast==1.5.4
typer==0.7.0
urllib3==1.26.13
wcwidth==0.2.5
websocket-client==1.4.2
wget==3.2
mmaz commented 1 year ago

cc @davidjurado @remg1997 - any thoughts on what might be causing this error? thanks!

jwmueller commented 1 year ago

not sure if MLCube will be a hard requirement or not for this challenge (current readme sounds like it's optional). If it's optional, then the readme should probably have instructions on how to download the necessary files yourself (without MLCube) and setup the right local directory structure.

Since MLCube is not working for me, I'm currently unable to produce a new selection algorithm.

davidjurado commented 1 year ago

Hello @jwmueller,

I don't remember seeing this error before, we recommend you create a new environment where you can install just the mlcube packages, you can use the following command:

virtualenv -p python3 ./env && source ./env/bin/activate && pip install mlcube-docker

Also, when running the mlcube command please make sure that Docker is running. During the process of producing a new selection algorithm, you can define the needed packages in the requirements.txt file that will be used inside the MLCube container.

MLCube is optional, you can download the data as described in the readme, you will need to download the MSWC metadata and the MSWC embeddings.

jwmueller commented 1 year ago

@davidjurado I did all that originally, and repeated again and still get the error.

Other things that could be clarified about MLCube usage in the readme include:

  1. need to have Docker running
  2. need to cd dataperf-speech-example before running, otherwise get error: No such file or directory: '/Users/.../mlcube.yaml'
  3. need to have pip installed requirements.txt beforehand, otherwise get error: ModuleNotFoundError: No module named 'typer'

Thanks for the pointers to data, I'll try that!

davidjurado commented 1 year ago

Thanks for the feedback @jwmueller, I'll update the Readme file in a new PR.