Distil Auto ML

Distil Auto ML is an AutoML system that integrates with D3M

More specifically it is the TA2 system from Uncharted and Qntfy

Main repo is https://github.com/uncharted-distil/distil-auto-ml

Quickstart using Docker

The TA2 system can be built and started via docker-compose however several static files must be downloaded before hand.

Datasets to train on. These may be user created or many examples can be downloaded from https://datasets.datadrivendiscovery.org/d3m/datasets

To train only using the TA2 user generated datasets must be formatted in the same way as the public datasets

Static Files may be pretrained weights of a neural network model, or a simple dictionary mapping tokens to necessary ids. Pretty much anything extra needed to run a ML model within the pipelines.

To bulk download all static files within the D3M universe WARNING this may be quite large

docker-compose run distil bash 
# cd /static && python3 -m d3m index download

One can also pick and choose which static files they wish to download via

python3 -m d3m primitive download -p d3m.primitives.path.of.Primitive -o /static

For more info on how static files integrate within D3M: https://datadrivendiscovery.org/v2020.11.3/tutorial.html#advanced-primitive-with-static-files

Once the static files and the dataset(s) you want to run on are downloaded

# symlink your datasets directory 
ln -s ../datasets/seed_datasets_current seed_datasets_current`

# choose the dataset you want to run 
export DATASET=185_baseball

# run it
docker-compose up distil

There are two testing TA3 systems also available via docker-compose:

# run the dummy-ta3 test suite
docker-compose up distil dummy-ta3

# run the simple-ta3 system, which will then be available in the browser at localhost:80
# this requires a directory named 'output' to exist, in addition to the seed_datasets_current directory
docker-compose up distil envoy simple-ta3

Development

Running From Source

Requirements:

Python 3.6

Pip (Python 3.6 should come with it)

virtualvenv

Instructions on setting up to run from source:

Clone distil-auto-ml

git clone https://github.com/uncharted-distil/distil-auto-ml

Install libraries on Linux

sudo apt-get install snappy-dev build-essential libopenblas-dev libcap-dev ffmpeg

Install libraries on MacOS

brew install snappy cmake openblas libpcap ffmpeg

Clone common-primitives

git clone https://gitlab.com/datadrivendiscovery/common-primitives.git

Clone d3m-primitives

git clone https://github.com/cdbethune/d3m-primitives

Clone d3m

git clone https://gitlab.com/datadrivendiscovery/d3m

Clone distil-primitives

git clone https://github.com/uncharted-distil/distil-primitives

Clone distil-primitives-contrib

git clone https://github.com/uncharted-distil/distil-primitives-contrib

Change into the distil-auto-ml directory
```
cd distil-auto-ml
```
To avoid package collision it is recommended to create a virtual environment
If virtualenv is not installed. Install virtualenv now.
```
python3 -m pip install virtualenv
```
Create the environment
```
python3 -m virtualenv env
```
Activate the environment
```
source env/bin/activate
```
Installing through server-requirements.txt Linux
```
pip install -r server-requirements.txt
```

Installing through server-requirements.txt MacOS

CPPFLAGS="-I/usr/local/include -L/usr/local/lib" pip install -r server-requirements.txt

Install all the other repository dependencies IMPORTANT: if running on the CPU replace [gpu] with [cpu]

cd ..
cd d3m
pip install -e .\[gpu\]
cd ..
cd common-primitives
pip install -e .\[gpu\]
cd ..
cd distil-primitives
pip install -e .\[gpu\]
cd ..
cd d3m-primitives
pip install -e .\[gpu\]
cd ..
cd distil-primitives-contrib
pip install -e .\[gpu\]
pip install python-lzo hyppo==0.1.3 mxnet
pip install -e git+https://github.com/NewKnowledge/simon-d3m-wrapper.git#egg=SimonD3MWrapper
pip install -e git+https://gitlab.com/datadrivendiscovery/sklearn-wrap.git@dist#egg=sklearn_wrap
pip install -e git+https://github.com/usc-isi-i2/dsbox-primitives#egg=dsbox-primitives
pip install -e git+https://github.com/neurodata/primitives-interfaces#egg=jhu-primitives
# if error with enum and IntFlag try pip uninstall -y enum34

MongoDB

Distil AutoML uses MongoDB as a backend store for it's internal hyperparameter tuning There are good instructions depending on your os from the official MongoDB Docs: https://docs.mongodb.com/manual/installation/
Distil-auto-ml is ready for use
```
./run.sh
```

generate pipelines

mkdir pipelines
python3 export_pipelines.sh

Use D3M CLI to interface with distil-auto-ml

Running D3M CLI Example

This section assumes the source has been successfully installed and the datasets have been downloaded. Launch d3m with the following arguments.

python3 d3m runtime -v {location/to/static_resources} -d {location/to/datasets/seed_datasets_current} fit-score 
-r {..seed_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA_problem/problemDoc.json}
-i {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/TRAIN/dataset_TRAIN/datasetDoc.json}
-t {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/TEST/dataset_TEST/datasetDoc.json}
-a {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/SCORE/dataset_SCORE/datasetDoc.json}
-p {..distil-auto-ml/pipelines/timeseries_rnn__a9cc5349-e328-401d-abb7-ada6b101e573.json}
-O {..distil-auto-ml/pipelines/timeseries_rnn__a9cc5349-e328-401d-abb7-ada6b101e573_run.yaml}

Building the Docker Container

CPU:

Building a docker image with CPU support is accomplished by invoking the docker_build.sh script:

MacOS/Linux

 sudo ./docker_build.sh

Windows

Run command prompt as administrator.

 ./docker_build.sh

GPU:

Building a docker image with GPU support is accomplished by adding the -g flag to the docker_build.sh call:

MacOS/Linux

 sudo ./docker_build.sh -g

Windows

Run command prompt as administrator.

 ./docker_build.sh -g

Troubleshooting Docker Image Failing to Build:

In the event that building the docker image fails and all of the above criteria has been met. One can invoke the docker_build.sh script again this time adding the -f flag. The -f flag forces the download and reinstall of all dependencies regardless of if they meet criteria. Note: if one is building for GPU support - remember the additional -g flag.

MacOS/Linux

 sudo ./docker_build.sh -f

Windows

Run command prompt as administrator.

 ./docker_build.sh -f

uncharted-distil / distil-auto-ml

readme

Distil Auto ML

Quickstart using Docker

Development

Running From Source

Requirements:

Instructions on setting up to run from source:

Running D3M CLI Example

Building the Docker Container

CPU:

MacOS/Linux

Windows

GPU:

MacOS/Linux

Windows

Troubleshooting Docker Image Failing to Build:

MacOS/Linux

Windows