vikhyat / moondream

tiny vision language model
https://moondream.ai
Apache License 2.0
4.88k stars 433 forks source link

Nvidia Jetson runtime failure #68

Open Links17 opened 6 months ago

Links17 commented 6 months ago
 python3 sample.py --image img.png --prompt "hi"
image
whab commented 6 months ago

I get a different type of error on my Jetson Orin with JetPack 5.1.2:

Using device: cuda If you run into issues, pass the --cpu flag to this script. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "sample.py", line 32, in moondream = Moondream.from_pretrained( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained model = cls(config, *model_args, model_kwargs) File "/home/theuser/Devs/GitHub/MoonDream/moondream/moondream/moondream.py", line 16, in init self.vision_encoder = VisionEncoder() File "/home/theuser/Devs/GitHub/MoonDream/moondream/moondream/vision_encoder.py", line 98, in init VisualHolder(timm.create_model("vit_so400m_patch14_siglip_384")) File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/_factory.py", line 117, in create_model model = create_fn( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 2598, in vit_so400m_patch14_siglip_384 model = _create_vision_transformer( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 1764, in _create_vision_transformer return build_model_with_cfg( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/_builder.py", line 385, in build_model_with_cfg model = model_cls(kwargs) File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 525, in init self.attn_pool = AttentionPoolLatent( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/attention_pool.py", line 63, in init self.init_weights() File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/attention_pool.py", line 68, in init_weights trunc_normaltf(self.latent, std=self.latent_dim ** -0.5) File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/weight_init.py", line 94, in trunc_normaltf _truncnormal(tensor, 0, 1.0, a, b) File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/weight_init.py", line 32, in _truncnormal tensor.erfinv_() RuntimeError: "erfinv_vml_cpu" not implemented for 'Half'

I made sure to run compatible versions of torch/torchvision with cuda 11.4: torch 2.1.0a0+41361538.nv23.6 torchvision 0.16.2

whab commented 6 months ago

I fixed my issue with sample.py by modifying the code in python3.8/sitepackages/timm/layers/weight_init.py to convert the tensor to float32 and back to float16 after the incompatible operations. This appears to be only executed before run-time inference so it does not seem to affect the performance. Running the webcam_gradio_demo.py demo (not affected by the above 'Half' issue by the way) on my Jetson Orin Dev Kit, I get an update every 2 secs or even a bit less, slightly faster than my Mac mini M2 Pro running MoonDream with MPS Torch ;)