Run offline? - Githubissues

profzelonka commented 8 months ago

Each time cog predict is run, a new container is created and 3gb+ is downloaded in order for it to run. Can we keep those downloaded files so that it doesn't have to download them each and every run?

With docker desktop you can use volume to retain those files it downloads, but then comes cog which I'm oblivious to how to make it all work together.. For example you could do: docker run -v musicgen-remixer:/src/ --gpus all r8.im/sakemin/musicgen-remixer@sha256:0b769f28e399c7c30e4f2360691b9b11c294183e9ab2fd9f3398127b556c86d7 but then what? If I run: docker exec my-musicgen-remixer cog predict r8.im/sakemin/musicgen-remixer@sha256:0b769f28e399c7c30e4f2360691b9b11c294183e9ab2fd9f3398127b556c86d7 -i output_format="wav" etc etc Cog doesn't exist in the container but in wsl2-ubuntu, so how do we execute it? (Not even positive this would work?)

How can we get it to work offline?

yocontra commented 8 months ago

You can predownload the weights with a script, we do this in a docker container:

import subprocess
import concurrent.futures
from audiocraft.models import MultiBandDiffusion

from demucs.pretrained import REMOTE_ROOT, _parse_remote_files
from demucs.repo import AnyModelRepo, BagOnlyRepo, RemoteRepo

model_version = "stereo-chord-large"

def download_file(url, dest):
    subprocess.check_call(["pget", url, dest], close_fds=False)

def download_model():
    # preload musicgen models and files
    url = f"https://weights.replicate.delivery/default/musicgen-chord/musicgen-{model_version}.th"
    dest = f"musicgen-{model_version}.th"
    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.submit(download_file, url, dest)
    MultiBandDiffusion.get_mbd_musicgen()

    # preload demucs models and files
    models = _parse_remote_files(REMOTE_ROOT / 'files.txt')
    model_repo = RemoteRepo(models)
    repo = AnyModelRepo(model_repo, BagOnlyRepo(REMOTE_ROOT, model_repo))
    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.map(repo.get_model, [model for model in models if model])

if __name__ == "__main__":
    download_model()

profzelonka commented 8 months ago

@yocontra Thanks! What would the set up be for this? Wouldn't I need to modify cog to support --volume in docker? Cog predict creates a container from the image and once it finishes the container is removed. How can we use the same container without it being removed?

When cog predict is run, this is what happens: Starting Docker image r8.im/sakemin/musicgen-remixer@sha256:0b769f28e399c7c30e4f2360691b9b11c294183e9ab2fd9f3398127b556c86d7 and running setup()... [downloads the files again in the temporary container]

yocontra commented 8 months ago

You want to build a new container and use the r8.im one as a base, and in your new container run that download.py. This is the container we build to use this image on runpod's serverless offering.

ARG COG_REPO
ARG COG_MODEL
ARG COG_VERSION

FROM r8.im/${COG_REPO}/${COG_MODEL}@sha256:${COG_VERSION} as build

# Install necessary packages and Python 3.10
RUN apt-get update && apt-get upgrade -y && \
    apt-get install -y --no-install-recommends software-properties-common curl git openssh-server && \
    add-apt-repository ppa:deadsnakes/ppa -y && \
    apt-get update && apt-get install -y --no-install-recommends python3.10 python3.10-dev python3.10-distutils && \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1 &&\
    curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
    python3 get-pip.py

# Create a virtual environment
RUN python3 -m venv /opt/venv

FROM build AS with_deps
# Install runpod and base deps within the virtual environment
ADD requirements.txt requirements-base.txt
RUN /opt/venv/bin/pip3 install --no-cache-dir -r requirements-base.txt

# Preload musicgen-chord model
ADD download.py download.py
RUN /opt/venv/bin/python3 download.py

FROM with_deps AS final
ADD src/handler.py rp_handler.py

CMD ["/opt/venv/bin/python3", "-u", "rp_handler.py"]

profzelonka commented 8 months ago

@yocontra Thanks! That makes sense, we're making a new image with the downloads. Only problem is, it doesn't seem to use what's been downloaded, docker re-downloads everything. I parsed the docker log and looks like these are the files it downloads on running regardless of base image or with download.py.

compression_state_dict.bin (236 MB)
mbd_musicgen_32khz.th (1.65 GB)
musicgen-stereo-chord.th (2.9 GB)
955717e8-8726e21a.th (80.2 MB)
spiece.model (792 KB)
tokenizer.json (1.39 MB)
config.json (1.21 KB) 
model.safetensors (892 MB)
config.json (758 bytes) (There are two files with this name, but they may be different based on their directories or contents)
model.safetensors (236 MB) (There are two files with this name as well)
harmonix-fold0-0vra4ys2.pth (1.40 MB)
harmonix-fold1-3ozjhtsj.pth (1.40 MB)
harmonix-fold2-gmgo0nsy.pth (1.40 MB)
harmonix-fold3-i92b7m8p.pth (1.40 MB)
harmonix-fold4-1bql5qo0.pth (1.40 MB)
harmonix-fold5-x4z5zeef.pth (1.40 MB)
harmonix-fold6-x7t226rq.pth (1.40 MB)
harmonix-fold7-qwwskhg6.pth (1.40 MB)

Any ideas? Attached is my used dockerfile and download.py renamed to a text file for easy upload/view:

Dockerfile.txt download.py.txt

sakemin / cog-musicgen-remixer

Run offline? #4