jhc13 / taggui

Tag manager and captioner for image datasets
GNU General Public License v3.0
692 stars 32 forks source link

Triton Error on Linux with THUDM/cogvlm2-llama3-chat-19B-int4 #177

Closed blu3nh closed 3 months ago

blu3nh commented 3 months ago

Trying to load THUDM/cogvlm2-llama3-chat-19B-int4 on linux (Pop!_OS) where the other models all work just fine.

The first error is that it cant find backend.pyc. This can be avoided by simply duplicating backend.py to backend.pyc.

However this just progresses things, but ultimately fails while trying to build triton.

On the onetrainer discord, I've seen a person install all the dependencies himself, and it worked that way. its just the prepackaged version that has this issue.

File "triton/common/backend.py", line 176, in get_cuda_version_key
File "triton/common/backend.py", line 146, in compute_core_version_key
FileNotFoundError
:
[Errno 2] No such file or directory: '/mnt/md/0/AI/taggui-linux/taggui-v1.26.0-linux/_taggui/triton/common/backend.pyc'
jhc13 commented 3 months ago

This should be fixed in v1.27.0. I included some files required by Triton that were missing in the bundle.

blu3nh commented 3 months ago

Hold on. I just got around to testing the new version. (1.27) I now get a new error, cogvlm2 is still not working for me. Here's the whole log for context:

Captioning... (device: cuda:0)
Traceback (most recent call last):
File "auto_captioning/captioning_thread.py", line 450, in run
File "torch/utils/_contextlib.py", line 115, in decorate_context
File "transformers/generation/utils.py", line 1758, in generate
File "transformers/generation/utils.py", line 2397, in _sample
File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
File "torch/nn/modules/module.py", line 1520, in _call_impl
File "accelerate/hooks.py", line 166, in new_forward
File "/home/caithy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B-int4/119df232ab9fca4a1be87f95c239d7b9a765032e/modeling_cogvlm.py", line 620, in forward
outputs = self.model(
^
^
^
^
^
^
^
^
^
^
^
File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
File "torch/nn/modules/module.py", line 1520, in _call_impl
File "accelerate/hooks.py", line 166, in new_forward
File "/home/caithy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B-int4/119df232ab9fca4a1be87f95c239d7b9a765032e/modeling_cogvlm.py", line 402, in forward
return self.llm_forward(
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
File "/home/caithy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B-int4/119df232ab9fca4a1be87f95c239d7b9a765032e/modeling_cogvlm.py", line 486, in llm_forward
layer_outputs = decoder_layer(
^
^
^
^
^
^
^
^
^
^
^
^
^
^
File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
File "torch/nn/modules/module.py", line 1520, in _call_impl
File "accelerate/hooks.py", line 166, in new_forward
File "/home/caithy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B-int4/119df232ab9fca4a1be87f95c239d7b9a765032e/modeling_cogvlm.py", line 261, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
File "torch/nn/modules/module.py", line 1520, in _call_impl
File "accelerate/hooks.py", line 166, in new_forward
File "/home/caithy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B-int4/119df232ab9fca4a1be87f95c239d7b9a765032e/modeling_cogvlm.py", line 204, in forward
query_states, key_states = self.rotary_emb(query_states, key_states, position_ids=position_ids, max_seqlen=position_ids.max() + 1)
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
File "torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
File "torch/nn/modules/module.py", line 1520, in _call_impl
File "accelerate/hooks.py", line 166, in new_forward
File "/home/caithy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B-int4/119df232ab9fca4a1be87f95c239d7b9a765032e/util.py", line 469, in forward
q = apply_rotary_emb_func(
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
File "/home/caithy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B-int4/119df232ab9fca4a1be87f95c239d7b9a765032e/util.py", line 329, in apply_rotary_emb
return ApplyRotaryEmb.apply(
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
File "torch/autograd/function.py", line 553, in apply
File "/home/caithy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B-int4/119df232ab9fca4a1be87f95c239d7b9a765032e/util.py", line 255, in forward
out = apply_rotary(
^
^
^
^
^
^
^
^
^
^
^
^
^
File "/home/caithy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B-int4/119df232ab9fca4a1be87f95c239d7b9a765032e/util.py", line 212, in apply_rotary
rotary_kernel[grid](
File "/mnt/md/0/AI/taggui-linux/taggui-v1.27.0-linux/_taggui/triton/runtime/jit.py", line 532, in run
self.cache[device][key] = compile(
^
^
^
^
^
^
^
^
File "/mnt/md/0/AI/taggui-linux/taggui-v1.27.0-linux/_taggui/triton/compiler/compiler.py", line 614, in compile
so_path = make_stub(name, signature, constants, ids, enable_warp_specialization=enable_warp_specialization)
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
File "/mnt/md/0/AI/taggui-linux/taggui-v1.27.0-linux/_taggui/triton/compiler/make_launcher.py", line 37, in make_stub
so = _build(name, src_path, tmpdir)
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
File "/mnt/md/0/AI/taggui-linux/taggui-v1.27.0-linux/_taggui/triton/common/build.py", line 106, in _build
ret = subprocess.check_call(cc_cmd)
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
File "subprocess.py", line 413, in check_call
subprocess
.
CalledProcessError
:
Command '['/usr/bin/gcc', '/tmp/tmpnyzcnhoi/main.c', '-O3', '-I/mnt/md/0/AI/taggui-linux/taggui-v1.27.0-linux/_taggui/triton/common/../third_party/cuda/include', '-I/mnt/md/0/AI/taggui-linux/taggui-v1.27.0-linux/_taggui/include/python3.11', '-I/tmp/tmpnyzcnhoi', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpnyzcnhoi/rotary_kernel.cpython-311-x86_64-linux-gnu.so', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu']' returned non-zero exit status 1.
jhc13 commented 3 months ago

Can you try sudo apt install python3.11-dev and check if it works?

NuaghtySociety commented 3 months ago

Still having the same problem even with v1.27.0v1.27.0 any tips ?

jhc13 commented 3 months ago

Still having the same problem even with v1.27.0v1.27.0 any tips ?

Can you try what I wrote above?

Can you try sudo apt install python3.11-dev and check if it works?

NuaghtySociety commented 3 months ago

I’ve been trying it on Windows, which might be why it isn’t working as expected. Do you have any updates on when it will be available for Windows?

Thanks!

jhc13 commented 3 months ago

I’ve been trying it on Windows, which might be why it isn’t working as expected.

That is the expected behavior when trying to use CogVLM2 on Windows.

Do you have any updates on when it will be available for Windows?

It will not become available unless Triton adds Windows support, or the creators of CogVLM2 modify the model code to remove the Triton dependency.

You can also take a look at #164.

stepfunction83 commented 3 months ago

I'm having the exact same issue on Linux. It did work when installing from source, so it's probably something in the way it's being packaged.

When it does work though, wow is it good.

jhc13 commented 3 months ago

I'm having the exact same issue on Linux. It did work when installing from source, so it's probably something in the way it's being packaged.

Which error are you getting?

And did you try this?

Can you try sudo apt install python3.11-dev and check if it works?

stepfunction83 commented 3 months ago

Sorry, I'm not going to mess with my system's Python installation to test this.

On Mon, Jun 17, 2024, 1:17 PM jhc13 @.***> wrote:

I'm having the exact same issue on Linux. It did work when installing from source, so it's probably something in the way it's being packaged.

Which error are you getting?

And did you try this?

Can you try sudo apt install python3.11-dev and check if it works?

— Reply to this email directly, view it on GitHub https://github.com/jhc13/taggui/issues/177#issuecomment-2173933957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH2WKO6CDKUGU6FVLSLIBTTZH4K37AVCNFSM6AAAAABI6PUMUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZTHEZTGOJVG4 . You are receiving this because you commented.Message ID: @.***>

blu3nh commented 3 months ago

sry. same. my linux server is a live environment with a lot of things running - i dont dare mess with python on it