LLVM/Triton issue loading llama-3

luke-lombardi commented 6 months ago

I've been trying to use unsloth with the following code:

    from unsloth import FastLanguageModel

    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="unsloth/llama-3-8b-Instruct-bnb-4bit",
        max_seq_length=2048,
        dtype=None,
        load_in_4bit=True,
    )
    FastLanguageModel.for_inference(model)
    inputs = tokenizer(
        [
            alpaca_prompt.format(
                "Be concise.",  # Instruction
                "What is the weather?",  # Input
                "",  # Output - leave this blank for generation!
            )
        ],
        return_tensors="pt",
    ).to("cuda")

    # This is where the processes crashes
    outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)

But it always errors out during generation with the following cryptic llvm error (python3.10) related to a non null-terminated buffer:

==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: NVIDIA A10G. Max memory: 21.988 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
python3.10: /source/llvm-project/llvm/lib/Support/MemoryBuffer.cpp:54: void llvm::MemoryBuffer::init(const char *, const char *, bool): Assertion `(!RequiresNullTerminator || BufEnd[0] == 0) && "Buffer is not null
terminated!"' failed.

I've looked at a core dump of the process and found that it's failing in triton:

#1  0x00007f2648f15859 in __GI_abort () at abort.c:79
#2  0x00007f2648f15729 in __assert_fail_base (fmt=0x7f26490ab588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x7f254f05fe18 "(!RequiresNullTerminator || BufEnd[0] == 0) && \"Buffer is not null terminated!\"", file=0x7f254f05fe68 "/source/llvm-project/llvm/lib/Support/MemoryBuffer.cpp", line=54,
    function=<optimized out>) at assert.c:92
#3  0x00007f2648f26fd6 in __GI___assert_fail (assertion=0x7f254f05fe18 "(!RequiresNullTerminator || BufEnd[0] == 0) && \"Buffer is not null terminated!\"",
    file=0x7f254f05fe68 "/source/llvm-project/llvm/lib/Support/MemoryBuffer.cpp", line=54, function=0x7f254f05fe9f "void llvm::MemoryBuffer::init(const char *, const char *, bool)") at assert.c:101
#4  0x00007f254c74199d in llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> > > getOpenFileImpl<llvm::MemoryBuffer>(int, llvm::Twine const&, unsigned long, unsigned long, long, bool, bool, std::optional<llvm::Align>) () from /tmp/taskqueue-fc99b301-ab59-47eb-94f4-decf5446e564-557f0a50/layer-0/merged/usr/local/lib/python3.10/dist-packages/triton/_C/libtriton.so
#5  0x00007f254c7406c9 in llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> > > getFileAux<llvm::MemoryBuffer>(llvm::Twine const&, unsigned long, unsigned long, bool, bool, bool, std::optional<llvm::Align>) () from /tmp/taskqueue-fc99b301-ab59-47eb-94f4-decf5446e564-557f0a50/layer-0/merged/usr/local/lib/python3.10/dist-packages/triton/_C/libtriton.so
#6  0x00007f254c740518 in llvm::MemoryBuffer::getFileOrSTDIN(llvm::Twine const&, bool, bool, std::optional<llvm::Align>) ()
   from /tmp/taskqueue-fc99b301-ab59-47eb-94f4-decf5446e564-557f0a50/layer-0/merged/usr/local/lib/python3.10/dist-packages/triton/_C/libtriton.so
#7  0x00007f254b72eeec in llvm::parseIRFile(llvm::StringRef, llvm::SMDiagnostic&, llvm::LLVMContext&, llvm::ParserCallbacks) ()
   from /tmp/taskqueue-fc99b301-ab59-47eb-94f4-decf5446e564-557f0a50/layer-0/merged/usr/local/lib/python3.10/dist-packages/triton/_C/libtriton.so
#8  0x00007f254c8cb74c in mlir::triton::linkExternLib (module=..., name=..., path=..., target=target@entry=mlir::triton::NVVM) at /opt/rh/devtoolset-10/root/usr/include/c++/10/optional:691
#9  0x00007f254c8ceec7 in mlir::triton::translateLLVMToLLVMIR (llvmContext=llvmContext@entry=0x7ffe412597c0, module=..., module@entry=..., target=target@entry=mlir::triton::NVVM)

I appreciate that this may not necessarily be an issue on the unsloth side, but thought someone here may have some pointers on what may be going on here / point me in the right direction.

For what it's worth this is how I installed unsloth:

python3 -m pip install torch ==2.3.0 unsloth[cu121-torch230]@git+https://github.com/unslothai/unsloth.git

Thanks for your time!

danielhanchen commented 6 months ago

@luke-lombardi Yes related issue to https://github.com/unslothai/unsloth/issues/501 and https://github.com/unslothai/unsloth/issues/504

raphaelbadawi commented 5 months ago

I have the same issue using the colab-new template and using the is_bfloat16_supported utility from unsloth, as mentioned in the linked issue.

But it didn't fix it for me. I don't use a T4 but an A10G, could it be the same kind of problem?

Skeletonboi commented 3 months ago

Having same issue when doing model.generate(..) after attempting to install unsloth on beam cloud (serverless gpu).

danielhanchen commented 3 months ago

Oh no that's not normally a good issue - it normally means something broke with the installation of Triton

unslothai / unsloth

LLVM/Triton issue loading llama-3 #499