uncharted-distil / distil-auto-ml

Distil Automated Machine Learning Server
Apache License 2.0
2 stars 1 forks source link

Distil Auto ML

Distil Auto ML is an AutoML system that integrates with D3M

More specifically it is the TA2 system from Uncharted and Qntfy

Main repo is https://github.com/uncharted-distil/distil-auto-ml

Quickstart using Docker

The TA2 system can be built and started via docker-compose however several static files must be downloaded before hand.

Datasets to train on. These may be user created or many examples can be downloaded from https://datasets.datadrivendiscovery.org/d3m/datasets

To train only using the TA2 user generated datasets must be formatted in the same way as the public datasets

Static Files may be pretrained weights of a neural network model, or a simple dictionary mapping tokens to necessary ids. Pretty much anything extra needed to run a ML model within the pipelines.

To bulk download all static files within the D3M universe WARNING this may be quite large

docker-compose run distil bash 
# cd /static && python3 -m d3m index download

One can also pick and choose which static files they wish to download via

python3 -m d3m primitive download -p d3m.primitives.path.of.Primitive -o /static

For more info on how static files integrate within D3M: https://datadrivendiscovery.org/v2020.11.3/tutorial.html#advanced-primitive-with-static-files

Once the static files and the dataset(s) you want to run on are downloaded

# symlink your datasets directory 
ln -s ../datasets/seed_datasets_current seed_datasets_current`

# choose the dataset you want to run 
export DATASET=185_baseball

# run it
docker-compose up distil

There are two testing TA3 systems also available via docker-compose:

# run the dummy-ta3 test suite
docker-compose up distil dummy-ta3

# run the simple-ta3 system, which will then be available in the browser at localhost:80
# this requires a directory named 'output' to exist, in addition to the seed_datasets_current directory
docker-compose up distil envoy simple-ta3

Development

Running From Source

Requirements:

  1. Python 3.6
  2. Pip (Python 3.6 should come with it)
  3. virtualvenv

Instructions on setting up to run from source:

CPU:

Building a docker image with CPU support is accomplished by invoking the docker_build.sh script:

MacOS/Linux

 sudo ./docker_build.sh

Windows

Run command prompt as administrator.

 ./docker_build.sh

GPU:

Building a docker image with GPU support is accomplished by adding the -g flag to the docker_build.sh call:

MacOS/Linux

 sudo ./docker_build.sh -g

Windows

Run command prompt as administrator.

 ./docker_build.sh -g

Troubleshooting Docker Image Failing to Build:

In the event that building the docker image fails and all of the above criteria has been met. One can invoke the docker_build.sh script again this time adding the -f flag. The -f flag forces the download and reinstall of all dependencies regardless of if they meet criteria. Note: if one is building for GPU support - remember the additional -g flag.

MacOS/Linux

 sudo ./docker_build.sh -f

Windows

Run command prompt as administrator.

 ./docker_build.sh -f