vikhyat / moondream

tiny vision language model
https://moondream.ai
Apache License 2.0
4.78k stars 423 forks source link

Torch + cuda on windows 11 with last nvidia drivers + last cuda toolkit #10

Open l33tm4st3r opened 7 months ago

l33tm4st3r commented 7 months ago

Fyi

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

Change device_map to cuda on text_model.py file.

Change cpu

class TextModel:
    def __init__(self, model_path: str = "model") -> None:
        super().__init__()
        self.tokenizer = Tokenizer.from_pretrained(f"{model_path}/tokenizer")
        phi_config = PhiConfig.from_pretrained(f"{model_path}/text_model_cfg.json")

        with init_empty_weights():
            self.model = PhiForCausalLM(phi_config)

        self.model = load_checkpoint_and_dispatch(
            self.model,
            f"{model_path}/text_model.pt",
            device_map={"": "cpu"},
        )

To cuda

class TextModel:
    def __init__(self, model_path: str = "model") -> None:
        super().__init__()
        self.tokenizer = Tokenizer.from_pretrained(f"{model_path}/tokenizer")
        phi_config = PhiConfig.from_pretrained(f"{model_path}/text_model_cfg.json")

        with init_empty_weights():
            self.model = PhiForCausalLM(phi_config)

        self.model = load_checkpoint_and_dispatch(
            self.model,
            f"{model_path}/text_model.pt",
            device_map={"": "cuda"},
        )

Thanks for sharing your amazing work!

vikhyat commented 7 months ago

Thank you for trying it out! Is the ask here to use the GPU when available, or are you seeing a failure when you try this?

eointolster commented 6 months ago

Not sure myself but I followed the instructions in the repo and mine defaults to cpu. While on Pinokio it works with gpu. So I tried this and it is a solution for me.

barleyj21 commented 6 months ago

On win10 and after installing requirements.txt, I've installed this current torch:

pip3 install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Also changed float32 to float16 in vision_encoder.py in addition to this in text_model.py:

self.model = load_checkpoint_and_dispatch(
            self.model,
            f"{model_path}/text_model.pt",
            device_map={"": "cuda:0"},
            dtype=torch.float16
)

Now it runs on about 4-5Gb of VRAM, but for some reason now I have to downscale images to fit into gradio demo after doing this. On CPU I could use ~2MB images, now only ~800Kb

vikhyat commented 6 months ago

I have to downscale images to fit into gradio demo after doing this

Interesting, we downscale the image pretty early in the pipeline so I'm not sure what's causing it. Will dig in later!

https://github.com/vikhyat/moondream/blob/main/moondream/vision_encoder.py#L22

Trimad commented 6 months ago

I wasn't able to get this working on Windows 10. Here's the error I get:

C:\Users\Tristan\Documents\!Hugging Face\moondream\env\lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
C:\Users\Tristan\Documents\!Hugging Face\moondream\env\lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
C:\Users\Tristan\Documents\!Hugging Face\moondream\env\lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Using device: cuda
If you run into issues, pass the --cpu flag to this script.

Traceback (most recent call last):
  File "C:\Users\Tristan\Documents\!Hugging Face\moondream\sample.py", line 35, in <module>
    moondream = Moondream.from_pretrained(model_id).to(device=device, dtype=dtype)
  File "C:\Users\Tristan\Documents\!Hugging Face\moondream\env\lib\site-packages\transformers\modeling_utils.py", line 3462, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "C:\Users\Tristan\Documents\!Hugging Face\moondream\moondream\moondream.py", line 15, in __init__
    self.text_model = TextModel(config)
  File "C:\Users\Tristan\Documents\!Hugging Face\moondream\moondream\text_model.py", line 12, in __init__
    self.tokenizer = Tokenizer.from_pretrained(f"{model_path}/tokenizer")
NameError: name 'Tokenizer' is not defined

(env) C:\Users\Tristan\Documents\!Hugging Face\moondream>

Here's my pip list:

(env) C:\Users\Tristan\Documents\!Hugging Face\moondream>pip list
Package                   Version
------------------------- ------------------------
accelerate                0.25.0
aiofiles                  23.2.1
altair                    5.2.0
annotated-types           0.6.0
anyio                     4.2.0
attrs                     23.2.0
certifi                   2023.11.17
charset-normalizer        3.3.2
click                     8.1.7
colorama                  0.4.6
contourpy                 1.2.0
cycler                    0.12.1
einops                    0.7.0
exceptiongroup            1.2.0
fastapi                   0.109.0
ffmpy                     0.3.1
filelock                  3.13.1
fonttools                 4.47.2
fsspec                    2023.12.2
gradio                    4.15.0
gradio_client             0.8.1
h11                       0.14.0
httpcore                  1.0.2
httpx                     0.26.0
huggingface-hub           0.20.1
idna                      3.6
importlib-resources       6.1.1
Jinja2                    3.1.3
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
kiwisolver                1.4.5
markdown-it-py            3.0.0
MarkupSafe                2.1.4
matplotlib                3.8.2
mdurl                     0.1.2
mpmath                    1.3.0
networkx                  3.2.1
numpy                     1.26.3
orjson                    3.9.12
packaging                 23.2
pandas                    2.2.0
Pillow                    10.1.0
pip                       23.3.2
psutil                    5.9.8
pydantic                  2.6.0
pydantic_core             2.16.1
pydub                     0.25.1
Pygments                  2.17.2
pyparsing                 3.1.1
python-dateutil           2.8.2
python-multipart          0.0.6
pytz                      2023.4
PyYAML                    6.0.1
referencing               0.33.0
regex                     2023.12.25
requests                  2.31.0
rich                      13.7.0
rpds-py                   0.17.1
ruff                      0.1.15
safetensors               0.4.2
semantic-version          2.10.0
setuptools                63.2.0
shellingham               1.5.4
six                       1.16.0
sniffio                   1.3.0
starlette                 0.35.1
sympy                     1.12
timm                      0.9.12
tokenizer                 3.4.3
tomlkit                   0.12.0
toolz                     0.12.1
torch                     2.3.0.dev20240122+cu121
torchaudio                2.2.0.dev20240123+cu121
torchvision               0.18.0.dev20240123+cu121
tqdm                      4.66.1
transformers              4.36.2
typer                     0.9.0
typing_extensions         4.9.0
tzdata                    2023.4
urllib3                   2.2.0
uvicorn                   0.27.0.post1
websockets                11.0.3
Trimad commented 6 months ago

Fyi

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

Change device_map to cuda on text_model.py file.

Change cpu

class TextModel:
    def __init__(self, model_path: str = "model") -> None:
        super().__init__()
        self.tokenizer = Tokenizer.from_pretrained(f"{model_path}/tokenizer")
        phi_config = PhiConfig.from_pretrained(f"{model_path}/text_model_cfg.json")

        with init_empty_weights():
            self.model = PhiForCausalLM(phi_config)

        self.model = load_checkpoint_and_dispatch(
            self.model,
            f"{model_path}/text_model.pt",
            device_map={"": "cpu"},
        )

To cuda

class TextModel:
    def __init__(self, model_path: str = "model") -> None:
        super().__init__()
        self.tokenizer = Tokenizer.from_pretrained(f"{model_path}/tokenizer")
        phi_config = PhiConfig.from_pretrained(f"{model_path}/text_model_cfg.json")

        with init_empty_weights():
            self.model = PhiForCausalLM(phi_config)

        self.model = load_checkpoint_and_dispatch(
            self.model,
            f"{model_path}/text_model.pt",
            device_map={"": "cuda"},
        )

Thanks for sharing your amazing work!

Did you change your imports in text_model.py to get this working?

barleyj21 commented 6 months ago

@Trimad, I didn't change imports as "import torch" is already there. And if you install torch with "-pre" (it means that it's going to install latest development build, so that might not work for you as things there are changed daily). You also missed part about changing fp32 to fp16 -read above in this topic if you want it.