udellgroup / oboe

An AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.
BSD 3-Clause "New" or "Revised" License
82 stars 17 forks source link

Does not install properly through pip #14

Closed eddiebergman closed 3 years ago

eddiebergman commented 3 years ago

Hello,

As part of a project, I am trying to fix up some things with automlbenchmark and from there realized oboe does not install correctly.

Reproducible code (sample shown in README.md)

method = 'Oboe' 
problem_type = 'classification'

from auto_learner import AutoLearner
import numpy as np

m = AutoLearner(p_type=problem_type, runtime_limit=30, method=method, verbose=False)

Error log:

Traceback (most recent call last):
  File "test_case.py", line 6, in <module>
    m = AutoLearner(p_type=problem_type, runtime_limit=30, method=method, verbose=False)
  File "/home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/auto_learner.py", line 87, in __init__
    with open(os.path.join(DEFAULTS, p_type + '.json')) as file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/defaults/Oboe/classification.json'

Looking at /home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe/defaults/Oboe/classification.json I can see that pip didn't include everything

ls /home/skantify/code/oboe/.venv/lib/python3.8/site-packages/oboe
__pycache__  ..               convex_opt.py  experiment_design.py  __init__.py  model.py     preprocessing.py
.            auto_learner.py  ensemble.py    generate_vector.py    linalg.py    pipeline.py  util.py

As you can see, the defaults folder and deeper is not included. Upon inspecting these folders, it seems it's because these are not python modules (lacking the __init__.py file). There for your setup.py would have to be modified according to this stack overflow answer

chengrunyang commented 3 years ago

@eddiebergman Sorry for my late reply, and thanks for catching the bug! I have fixed it in the latest commit (76d322448e3850e9afd04f4d1da13ad389432719), and have verified that the installed library (pip installed from git cloned local source) works well on the demo in the root README.md.

The PyPI version hasn't been updated because of the 100MB size limit. I will try to find a workaround.

eddiebergman commented 3 years ago

Hi @chengrunyang,

Thats no problem and thanks for fixing that, I'll try it out once I can :) Do you mind me asking how the repo is more than 100MB?

chengrunyang commented 3 years ago

Great.

oboe uses meta-learning to pick models for a new dataset. The files in defaults folder that were not included in installation are the performance and runtime of models on meta-training datasets. Despite the disadvantage of the large size, including those meta-training data files will make it possible for users to tweak the matrix/tensor factorization ranks of the meta-training matrix/tensor. Nevertheless, I will take a look at whether just including the latent factors of these matrices/tensors in the installation would suffice for most use cases.

eddiebergman commented 3 years ago

You could try to compress those files and use setup.py to decompress them once dependencies are installed :) However this is likely difficult to the fact you would have to assume the user has the decompression application you would wish to use

chengrunyang commented 3 years ago

Now in the latest release (v0.2.0, and the same version number on PyPI), the library can be installed either by pip or from source, and the installation includes necessary files. Thanks again for reporting and discussing this bug! I'll close this issue for now. Feel free to reopen it or submit another issue for anything.