replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
7.93k stars 553 forks source link

Error when trying to run inference #1217

Closed kenhuang1964 closed 1 year ago

kenhuang1964 commented 1 year ago

Hey guys, I hope you are all doing well. I have my predict.py and cog.yaml ready in my project repository, but when I try to run "cog predict -i image=@https://cdn.shopify.com/s/files/1/0057/3728/3618/products/a-man-called-otto_ezrjr0pm_480x.progressive.jpg" I get an error, causing the docker image to fail to build. Here is the error:

#0 9.238 Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
#0 9.240 /usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
#0 9.241 /usr/sbin/ldconfig: 16: exec: /sbin/ldconfig.real: not found
#0 9.241 dpkg: error processing package libc-bin (--configure):
#0 9.241  installed libc-bin package post-installation script subprocess returned error exit status 127
#0 9.246 Errors were encountered while processing:
#0 9.246  libc-bin
#0 9.264 E: Sub-process /usr/bin/dpkg returned an error code (1)
------
Dockerfile:13
--------------------
  12 |     ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
  13 | >>> RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
  14 | >>>  make \
  15 | >>>  build-essential \
  16 | >>>  libssl-dev \
  17 | >>>  zlib1g-dev \
  18 | >>>  libbz2-dev \
  19 | >>>  libreadline-dev \
  20 | >>>  libsqlite3-dev \
  21 | >>>  wget \
  22 | >>>  curl \
  23 | >>>  llvm \
  24 | >>>  libncurses5-dev \
  25 | >>>  libncursesw5-dev \
  26 | >>>  xz-utils \
  27 | >>>  tk-dev \
  28 | >>>  libffi-dev \
  29 | >>>  liblzma-dev \
  30 | >>>  git \
  31 | >>>  ca-certificates \
  32 | >>>  && rm -rf /var/lib/apt/lists/*
  33 |     RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash && \
--------------------
ERROR: failed to solve: process "/bin/sh -c apt-get update -qq && apt-get install -qqy --no-install-recommends \tmake \tbuild-essential \tlibssl-dev \tzlib1g-dev \tlibbz2-dev \tlibreadline-dev \tlibsqlite3-dev \twget \tcurl \tllvm \tlibncurses5-dev \tlibncursesw5-dev \txz-utils \ttk-dev \tlibffi-dev \tliblzma-dev \tgit \tca-certificates \t&& rm -rf /var/lib/apt/lists/*" did not complete successfully: exit code: 100
ⅹ Failed to build Docker image: exit status 1

I'm on an A100 instance from lambdalabs. Here is my predict.py code:

from cog import BasePredictor, Input, Path

import argparse
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import os
from llava.conversation import conv_templates, SeparatorStyle
from llava.utils import disable_torch_init
from transformers import CLIPVisionModel, CLIPImageProcessor, StoppingCriteria
from llava.model import *
from llava.model.utils import KeywordsStoppingCriteria

from PIL import Image,ImageOps

import os
import requests
from PIL import Image
from io import BytesIO

DEFAULT_IMAGE_TOKEN = "<image>"
DEFAULT_IMAGE_PATCH_TOKEN = "<im_patch>"
DEFAULT_IM_START_TOKEN = "<im_start>"
DEFAULT_IM_END_TOKEN = "<im_end>"

def load_image(image_file):
    if image_file.startswith('http') or image_file.startswith('https'):
        response = requests.get(image_file)
        image = Image.open(BytesIO(response.content)).convert('RGB')
    else:
        image = Image.open(image_file).convert('RGB')
    return image

def resize_image(image, target_size):
    width, height = image.size
    aspect_ratio = width / height
    if aspect_ratio > 1:
        new_width = target_size[0]
        new_height = int(new_width / aspect_ratio)
    else:
        new_height = target_size[1]
        new_width = int(new_height * aspect_ratio)
    image = image.resize((new_width, new_height))
    width_diff = target_size[0] - image.size[0]
    height_diff = target_size[1] - image.size[1]
    left_padding = 0
    top_padding = 0
    right_padding = width_diff - left_padding
    bottom_padding = height_diff - top_padding
    padded_image = ImageOps.expand(image, border=(left_padding, top_padding, right_padding, bottom_padding), fill=0)
    return padded_image

class Predictor(BasePredictor):
    def setup(self) -> None:
        """Load the model into memory to make running multiple predictions efficient"""
        self.model_name = "llava_v1"  # Predefined model name
        self.conv_mode = None  # Predefined conversation mode

        disable_torch_init()
        self.model_name = os.path.expanduser(self.model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)

        if "mpt" in self.model_name.lower():
            self.model = LlavaMPTForCausalLM.from_pretrained(self.model_name, low_cpu_mem_usage=True, torch_dtype=torch.float16, use_cache=True).cuda()
        else:
            self.model = LlavaLlamaForCausalLM.from_pretrained(self.model_name, low_cpu_mem_usage=True, torch_dtype=torch.float16, use_cache=True).cuda()
        self.image_processor = CLIPImageProcessor.from_pretrained(self.model.config.mm_vision_tower, torch_dtype=torch.float16)

        mm_use_im_start_end = getattr(self.model.config, "mm_use_im_start_end", False)
        self.tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)
        if mm_use_im_start_end:
            self.tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)

        self.vision_tower = self.model.get_model().vision_tower[0]
        if self.vision_tower.device.type == 'meta':
            self.vision_tower = CLIPVisionModel.from_pretrained(self.vision_tower.config._name_or_path, torch_dtype=torch.float16, low_cpu_mem_usage=True).cuda()
            self.model.get_model().vision_tower[0] = self.vision_tower
        else:
            self.vision_tower.to(device='cuda', dtype=torch.float16)
        self.vision_config = self.vision_tower.config
        self.vision_config.im_patch_token = self.tokenizer.convert_tokens_to_ids([DEFAULT_IMAGE_PATCH_TOKEN])[0]
        self.vision_config.use_im_start_end = mm_use_im_start_end
        if mm_use_im_start_end:
            self.vision_config.im_start_token, self.vision_config.im_end_token = self.tokenizer.convert_tokens_to_ids([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN])

    def predict(self, image: Path = Input(description="Input image"), query: str = Input(description="Query")) -> Path:
        """Run a single prediction on the model"""
        image_token_len = (self.vision_config.image_size // self.vision_config.patch_size) ** 2

        qs = query
        if self.vision_config.use_im_start_end:
            qs = qs + '\n' + DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_PATCH_TOKEN * image_token_len + DEFAULT_IM_END_TOKEN
        else:
            qs = qs + '\n' + DEFAULT_IMAGE_PATCH_TOKEN * image_token_len

        if "v1" in self.model_name.lower():
            conv_mode = "llava_v1"
        elif "mpt" in self.model_name.lower():
            conv_mode = "mpt_multimodal"
        else:
            conv_mode = "multimodal"

        if self.conv_mode is not None and conv_mode != self.conv_mode:
            print('[WARNING] the auto inferred conversation mode is {}, while `--conv-mode` is {}, using {}'.format(conv_mode, self.conv_mode, self.conv_mode))
        else:
            self.conv_mode = conv_mode

        conv = conv_templates[self.conv_mode].copy()
        conv.append_message(conv.roles[0], qs)
        conv.append_message(conv.roles[1], None)
        prompt = conv.get_prompt()

        inputs = self.tokenizer([prompt])

        image = load_image(image)
        image = resize_image(image, (336, 336))
        image_tensor = self.image_processor.preprocess(image, return_tensors='pt')['pixel_values'][0]

        input_ids = torch.as_tensor(inputs.input_ids).cuda()

        stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2
        keywords = [stop_str]
        stopping_criteria = KeywordsStoppingCriteria(keywords, self.tokenizer, input_ids)

        with torch.inference_mode():
            output_ids = self.model.generate(
                input_ids,
                images=image_tensor.unsqueeze(0).half().cuda(),
                do_sample=True,
                temperature=0.2,
                max_new_tokens=1024,
                stopping_criteria=[stopping_criteria])

        input_token_len = input_ids.shape[1]
        n_diff_input_output = (input_ids != output_ids[:, :input_token_len]).sum().item()
        if n_diff_input_output > 0:
            print(f'[Warning] {n_diff_input_output} output_ids are not the same as the input_ids')
        outputs = self.tokenizer.batch_decode(output_ids[:, input_token_len:], skip_special_tokens=True)[0]

        outputs = outputs.strip()
        if outputs.endswith(stop_str):
            outputs = outputs[:-len(stop_str)]
        outputs = outputs.strip()
        return outputs

Here is my cog.yaml file:

# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
  # set to true if your model requires a GPU
  gpu: true

  # a list of ubuntu apt packages to install
  system_packages:
    - "libgl1-mesa-glx"
    - "libglib2.0-0"

  # python version in the form '3.11' or '3.11.4'
  python_version: "3.10"

  # a list of packages in the format <package-name>==<version>
  python_requirements: requirements.txt

  # commands run after the environment is setup
  run:
    - "echo env is ready!"
    - "echo another command if needed"

# predict.py defines how predictions are run on your model
predict: "predict.py:Predictor"

I'm really hoping this can be fixed easily! Thanks in advance!

hongchaodeng commented 1 year ago

Please see https://github.com/replicate/cog/issues/1189

This is a known bug in v0.8.x On quick fix is to use v0.7.2 for now. Or set gpu: false to build it and then run it with docker --gpus all ... yourself :(

kenhuang1964 commented 1 year ago

Thanks @hongchaodeng !