Open chigkim opened 1 year ago
Hi @chigkim
Are you testing from text-gen-webui or some other repo? (not familiar with these)
Not sure if this is due to the template, as we use a prompt template similar to LLaMA-2 for llava-llama-2-chat series (something like below prompt), which is different from vicuna template.
[INST] <<SYS>>
You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.
<</SYS>>
<image> [/INST]
Maybe I will have some bandwidth to try this out myself either Friday or this weekend. Any pointer in the repositories I should try first? Thanks.
Thanks for your response! I used this repo. https://github.com/oobabooga/text-generation-webui Here's my quantized model. https://drive.google.com/drive/folders/1-njjlAXE8JD_UnccZ15geFIMMBU5PZKC
After clone, you need to Put the model folder inside text-generation-webui/models. You should have text-generation-webui/models/liuhaotian_llava-llama-2-13b-chat-lightning-preview/llava-llama-2-13b-chat-4bit-128g.safetensors.
Here's my setup code.
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
pip install deepspeed mpi4py xformers
python server.py --verbose --share --chat --model-dir models --model liuhaotian_llava-llama-2-13b-chat-lightning-preview --loader exllama --max_seq_len 4096 --xformers --no-stream --deepspeed --multimodal-pipeline llava-13b
Once you open the link from Gradio, go to Chat Settings > Instruction template > and edit the Context. Go back to Text generation and choose instruct.
Here's my Colab notebook that you can just run that will automatically download and load the my quantized model.
https://colab.research.google.com/drive/1n9Tq9XmmTElkRHazVKi2CUaVTqVWFjfP?usp=sharing
Hope this will get you started.
Also I opened an issue on oobabooga/text-generation-webui. https://github.com/oobabooga/text-generation-webui/issues/3293
Thanks for your response! I used this repo. https://github.com/oobabooga/text-generation-webui Here's my quantized model. https://drive.google.com/drive/folders/1-njjlAXE8JD_UnccZ15geFIMMBU5PZKC
After clone, you need to Put the model folder inside text-generation-webui/models. You should have text-generation-webui/models/liuhaotian_llava-llama-2-13b-chat-lightning-preview/llava-llama-2-13b-chat-4bit-128g.safetensors.
Here's my setup code.
git clone https://github.com/oobabooga/text-generation-webui cd text-generation-webui pip install -r requirements.txt pip install deepspeed mpi4py pip install -U num2words omegaconf xformers python server.py --verbose --share --chat --model-dir models --model liuhaotian_llava-llama-2-13b-chat-lightning-preview --loader exllama_hf --max_seq_len 4096 --xformers --no-stream --deepspeed --multimodal-pipeline llava-13b
Once you open the link from Gradio, go to Chat Settings > Instruction template > and edit the Context. Go back to Text generation and choose instruct. Hope this will get you started.
I used the instructions above when loading the model. The setup works but when I pass in an image to the model, I get an error:
You: <img src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCADgAY4DASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3iiiipGFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUU3JYkKBx1J6CgB1FNw3/PSOjDf89EoAdRTcN/z0SjDf8APRKAHUU3Df8APRKMN/z0SgB1FNw3/PRKMN/z0SgB1FNw3/PRKMN/z0SgB1FNw3/PRKMN/wA9EoAdRTcN/wA9Eow3/PRKAHUU3Df89Eow3/PRKAHUU3Df89Eow3/PRKAHUU3Df89Eow3/AD0SgB1FNw3/AD0SjDf89EoAdRTcN/z0SjDf89EoAdRTcN/z0SjDf89EoAdRTcN/z0SjDf8APRKAHUU3D9mRvbOKVW3ex7igBaKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAD0pi/6o+7mnnpRAAVOR/EaaAqSfatz+WItvGzdn8c0+Pzcv5m3G75Nvp71e2r/dH5UyVkijLFQewFMRBRTY7xWzuiAxnGO/5gfnTFv2eMOunz8qGwVA6qTj9APxFAEtFNN7gnNnMBv2AkDn5sAjnp3+lXNq/wB0flQBVoq1tX+6Pyo2r/dH5UAVaKtbV/uj8qNq/wB0flQBVoq1tX+6Pyo2r/dH5UAVaKtbV/uj8qNq/wB0flQBVoq1tX+6Pyo2r/dH5UAVaKtbV/uj8qNq/wB0flQBVoq1tX+6Pyo2r/dH5UAVaKtbV/uj8qNq/wB0flQBVoq1tX+6Pyo2r/dH5UAVaKtbV/uj8qCqgfdH5UAVaKjGqWjG32LI3nuY1xERtIBPzZHHQ9avbV/uj8qAKtOP+vf8DU5VcHgflUB/17fQUmMdRRRSAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAPSkidY0JdlUbzyTilPSuM+Iy7vCaD/p8X+TUAdr9pg/57x/99ioLz7Nd2zQm4jXPRgwODXzz5Q9KTyx6UcwWPdLDS4bSZpWvYiedu1h36E0W2nywfYRJrfm/ZwwkLHmXJOM89unOenavCjGPSmmMelLmCx7oumXAhtkOvkNE5ZyCP3gypwcnP8J/76Pat37Tb/8APeL/AL7FfNZjHpUZjHpRzBY+mPtVv/z3i/77FH2q3/57xf8AfYr5kaMelRtGPSjmHY+n/tVv/wA94v8AvsUn2q3/AOfiL/vsV8ttGPSomjHpRzCsfVP2u2/5+Iv++xR9rtv+fiL/AL7FfKTIPSoWjHoKOYLH1n9rtv8An4i/77FH2y2/5+Yf++xXyK8Y9KgaMelHMFj7FSRJF3I6svqpyKdketeffBlQPh/GMf8AL1L/ADr0HA9KtCDI9aMj1owPSjA9KADI9aMj1owPSjA9KADI9aMj1owPSjA9KADI9aMj1owPSjA9KADI9aMj1owPSjA9KAKVtpyW10863Fw+8sfLeTKLuOeB2q7ketGB6UYHpQAhIwearn/Xt9BVggYNVz/r2+gpMB1FFFIYUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAAelcf8QRnwso/6fF/k1dgelcj4+GfC6/8AX2v8moYHlez2pNtTFaaRUFEJWmFanIqNhQBCVqMrU5FMIoArstRstWGHNRMKBldlqJlqwwqFhQIrstRMKsMKiYUAVnWoHWrbCoHFID334ODHgGP/AK+Zf511Ora4mlX+mWrwl/t8kkYcNjZsjaTnjnO3H41y/wAHv+RCj/6+Zf512d1ptpe3FrcXEW+W1ZmhbcRsLKVJ4P8AdJHPrWq2JZy8fxG06bw/BqMMRe4lEObbd9xnMeVL4xkLKprUfxZYy6bfXOmg301tam7WBcoZo8sAVJHOSjAfT3FOg8HaFbWj2kFj5dtII98SyvtYx7dhIz1GxeevAq9YaJp2mSO9naJEzoIzjJwgZmCjPQZdjgetMRnN4y0iKKCaaZkhuIhLDKEZldTG0nUDrhH49vcVNbeK9HvLmGC3u1lebYV2fMPn37eRkc+W2Pp7imp4P0KN7ZorHyzbKqQ+XK6hAu8LgA9hI/8A30asWPhzStNvGu7K0FvM4IcxMyiTLFvmAODyzHkcZNAGbB440x57iC5b7NcQztAbZ8+duBbB2Y+6yoWBBIx71q6RrdpraXT2fmGO3m8kuy4DHarZHfGGHWornw1pV3cJcz2xe4jKmObzGDx7SSNrZyPvN09cdKuWenWthJcvbR7GuZfOm+Ync+AM8ng4A/KgC1RRRQAUUUUAFFFFAAehqsf9e30FWT0NVj/r2+gpMY6iiikAUhNBpQEEbPI21R1JOAKAG5ozUaXFq6K5dlDEgbhjocVIkls7hFlG85AXPPFOwC5paRgFcgUtIAooooAKKKKACiiigAooooAKKKKAA9K5Px7/AMiwv/X2v8mrrD0rk/Hf/Isr/wBfa/yNAHmJFNNSEUwipKI2qJqlaompAMCF3VF5ZjgVtt4O1j7L9o8uDZjP+uUn9DWPbZN3FjnDdK9P0XTYmTY8amN8FlI4JppXE3Y8vudPu7WMSTwPGhbaGI4Jqiw5r134h2Pm+GVmQf8AHtIrYA6Kfl/qK8ic0NWBO5Ew5qJhUpNRtzSGQsKiYVOwqJhQBAwqBxxVlhUD0gPe/g//AMiHH/18y/zrpdesry7+xGzBLRTh2HmlFx7gcn8D+dc38IP+RDj/AOvmX+dd02ccVrHYhnOHS9TETou4kziRXafkHOSTjGVPTHUUtzpepT3dzJukWGQAlFmAZuUOATwMbWHQdepzXQfP60fP607hYq+ReHRkg3ol15QUsrEAHHJBqGaDV2kKw3MKxqowzL95ueo6+nf14rQ+f1o+f1ouBQig1cSRebdwFAw8zCcsMduOP89OlEMGr/aEae7h8oNkqickZPGcfSr/AM/rR8/rRcLElFR4f1ow/rQBJRTVDZ56UpNABmkzTGJJxnikwP8AJoAkJ4NVif37fQVLgU0CNmbbtJBw2DnB9DQAmaM05QjDK4I9Qc0bF/u0rDGE8U0yr5bRugZW6inlF9KjaNfSiwhBLCEWPyFKr0B5xSrNFu3CBQ3rgZqF0wMjtTo8AO23OBnHrQBMZN7FsYpwNVUvotnMEgbuNven/wBoReXnyJN2Om3vRYCxmloLBo0dQQGGcEUUhhRRRQAUUUUAFFFFABRRRQAHpXKeO/8AkWV/6+1/ka6s9K5Tx1/yLS/9fa/yNAHmhFMNPJFRtUlEbCoHqZjVeQ54oA0dAtDdakhI+VTmvVdMjCItcF4bi8vLY613dlJgLVxIka1/Zxahps9pN/q5oyje2R1r581Ozm0zUbiyuBiWFyp9/Q/iOa+iI3yuK8n+LFgLfULPUUXAnQxuf9pen6H9KJIEzmfD2lwateSLdzSxW6AZaJQW3Hp1+hrevvCmmRMsVoNWuZZeIykQIz78Dj8RUfhC32afGxHzTs0v4fdH8j+demaQgVRx2qUhtnk3iLwdd+HdLtLq6kR3ndlZU5CHGQM+vX8q5ZxXvXj3TW1LwfdrGu6SECdP+A8n9M14OeRSkrDTuVnFQsKsvVd6kZ718If+REj/AOvmX+ddyy7u9cN8Iv8AkRI/+vmX+ddVq91LbwosRKlycsO1KvXjQourLZBCDnPlRe8v3o8v3rl/tVx/z3k/77NH2q4/57yf99mvG/t+n/I/vOv6lLudR5fvR5fvXL/arj/nvJ/32a56TxnPbXEyXNrfJEruscqNkOqEhm5xjp71tSzf21/Z027eaInheT4pHpPl+9Hl+9ed/wDCWXAVpHtr5Yj/AKoknLfUduvv+lbSXdyyKzTSKSASN54qq2augk6lNq/mv0Jp0I1G1GW3kzraK5T7Vcf895P++zSfarj/AJ7yf99muf8At+n/ACP7zX6jLudZTSaxdKvJnuhC7s6sCfmOcVstXrYPFxxVP2kVboctWk6cuVjAcsaWmpyT9a8h1X4jeILXVryCGS2WKKd0QGEHADEDmuozsewVky6M7XE0sN5JCJGLbFzjJGCTz1zXkx+KHiYf8trX/vwKhf4qeJx0mtf/AAHFK4+Vnq0ui3cEOLa7Ykt91RsHJ68HHHJqaTRJ3JA1O4VM8bSd2OwzmvHX+LHikf8ALe0/8BxUTfFvxWOk9p/4DilzIfKz3yNWWJFZtzBQC2OtBFfPzfF/xaOk9n/4DCoT8YvF4/5b2f8A4DCjmQcrPoSRf3bfSoUdkOR3qh4b1GfWPCGnajdbPPubVZJNgwNxHOBV1eabETfaJAeo/KnC4k9R+VQ4pwFICbzGfrThUaipBQAtFJS0AFFFFABRRRQAUUUUAB6Vyfjv/kWV/wCvtf5NXWHpXJ+PP+RZX/r7X+TUAeZmo2qQmomqSiNzUKjdOg96kc021Be8QD60AdtoF1Olg8QtbWVj/qmZNpz6Eiui0jQry4RbrUbqa3mJyILWTCKM8A5HNZuhQCNE45FdvbnK1SRDZVicq7IeoOK474r25m8ILKoy8VwhH45X+orsbkeXe57OM1jeMYBceG3RuQJ4WP4SKapiRyukW3kokS8CKNY/yGP8a7bTeEHNcvpiBlJ/H6mupsOABQhGwyLLC0bjKsCpHsa+bL+0aw1S7snGGglaP8jX0ohyK8P+I1mLTxrdOowtwiTfiRg/qKmWxcTjpBVdxVqQVXcVBR7x8Iv+REj/AOvmX+ddpeR28luRckBBzknGK4v4R/8AIix/9fMv866fW45HgjZASqk7gP51jjJ+zw8pcvNpsVSjzVEr2K3kaT/z8v8A5/CjyNJ/5+X/AM/hWXRXyX12P/PqP3P/ADPU9i/5mankaT/z8v8A5/Cg2+kEEG4cg9Qf/wBVY10kslnOkDBJmjYRsegbHB/OsbStL1mw0O3tpdVae7Ql3kmG7ORwhPcA962hiKcqbm4QTTStZ36679P1M5QkpKN2/uOz8jSf+fl/8/hSeRpP/Py/+fwrmmj1gwgLPbLIeC204HJ5A9xirNml4sbfbJIpHzx5SkAD8ah4qCV/Zw+5/wCZSpt/aZueRpP/AD8v/n8KPI0n/n5f/P4Vl0VH12P/AD6j9z/zK9i/5mdFp8Vkm5rV97dCSeRVp+lYejxyG88wAhApDHsfatx+lfUZXV9rh1Ll5fTb1PNxMeWpa9xsfU/WvnPXj/xP9S/6+pf/AEI19FxdT9a+cdeP/E/1L/r6l/8AQzXezFGYxqzpOjXmuXwtbNMnG53P3UHqaqHlgB1JxXsXgyysdC8Pol1LHFezsXm3dc5wB+A/mall3OeTwRZabEDKvny92kGfyHQVm6lodhJHsaBVbsyDBFeg6mYLhMW7ecwGRsBbH+c1xs1hquo6hHaQ6XeLE7gee0RVF56n2rNopPQ8v1Kzk0+8e3k+8AGB9QRkH8jWe1eg/FjTV07xPbJGhWE2USIx/j25UnH4CvPmqhH1D4H/AOSd6J/15L/KtiOsjwP/AMk60X/ryX+Va8daGZMBTgKQU4UgFAp1IKWgApaSloAKKKKACiiigAooooAD0rk/Hhx4YX/r7X+TV1h6VyPj848Lr/19r/JqAPMiaic0F6id6kobI1X9CtvOmMh6bsflWVI/FdX4chC2UbHq2TQI7DSlAUcV1NuRtH0rmdPwCK6G2b5FFaED79MxK46o36Vk6+A2gXOeyg/qK3JRut3HsawdcOfD91/uD+YoAwtOASAcDmugsgQBmsOxH7kfStyyJ70CNiI8CvIviyoXxJZv3a1H6Ma9cjPFeS/F87da009zbt/6FSexS3PP36VA9TFvlqB6zLPePhH/AMiLH/18y/zra8W+KoPC2nJO8RnnmYrFEDjOOpJ9Bx+dYvwj/wCRFj/6+Zf51o+N/CbeKNOhWCVYrq3YtGX+6wPUH06Dn2rrwipOrFVvh6nNiXUVKTpfF0OL/wCFuah/0CrT/vtqP+Fuah/0CrT/AL7as7/hVviT+7af9/v/AK1H/CrfEn920/7/AH/1q+i9llf937zwfaZl5/caH/C3NQ/6BVp/321W0+JOvyDKeHFYHByEkPXGO3uPzrE/4Vb4k/u2n/f7/wCtW1H4Z8exQJCtzaeWiIiqXXgKCF/h689azqUsv+xy/NsuFTH/AG+b5JEh+IniMLuPhnA3bc+XJ1zjHTrniqb/ABZ1OKRo5NHtkdThlZmBB9CKuf8ACPfEDOftNmfUFlwec4+7WNcfDXxTdXMtxObR5ZWLuxm6k8ntSp08Bf31H5NjnUx1vc5vmkXP+Fuah/0CrT/vtqT/AIW5qH/QKtP++mrP/wCFW+JP7tp/3+/+tR/wq3xJ/dtP+/3/ANatfZZX/d+8z9pmXn9x2/g/x+niO9bT7m0W2udpeMo2VcDqPY12L9K4LwR4Au9C1T+09Sli81FKxRREtjPBJP07e9d69eHjlQVa1D4T2cE6zpXr7jYep+teK6t4E1S51i+nSe1CyXEjjLnOCxPpXtMX8X1rnrzSHuJZSszpuYnI+tcZ13seUN8OtXP/AC82Y/4G3+Fdau6OeK1u5CJ44l3BDwTjqD+FXrjwVdzsWXV7pM+lZ03w41WSVJY/EU6yJ90vHux+tTJXWhUXbc3LdVG37/8A31W9bOcADNc7YeGPElsFWXWrGUDubJgf0eunsbC4gjIubhZn/wBiLYP5mojFlOSPMfH3g3WfFXiVr2Ca1S3jiSGIOzZwOp6epNcm3wl13/n5sv8Avpv8K9+aFf7lMMC/3a05UZ8zK3hnT5dL8G6bYTsrS29qsbFehIHarkfSrIG2zx6JVaKhjJxThTRTxSAWlopaACiiigAooooAKKKKACiiigAPSuO+IeR4VTAJ/wBMXp9GrsT0rC8SqzaKAnX7QP5GgDxUlv7rflUbb/7jflXaz210fuykfhWZcWmsDJhvGH4CpKOWkWT+4/8A3ya73QYsaXaZBB8sE1y11L4ot8lZ2YD0UV12iSzTaXayXBPntGC+fWmhM6GyHIrftvuisWzXoa3LccCrMy0f9U30NYOrjd4eu8dfLz+tbzcQufRTXM6teomhXCj5meIjA/U0DKOmpmIEntWxbAhvasXRmJjx2Arfg7UxGjCPlFeS/F+OeTW9NEUMkgFs2SiE/wAXtXrkQwoqlqCk3Ckf3aTGj5v+z3mP+PK5/wC/Lf4U02l8f+XG6/78t/hX0T5Z9TR5Z9TUcpVzM+E0csXgeNZYnjb7RL8rqVPX0NYPxy8V6v4d0PT7XSp5LU30jrLcRnDKqgfKD2znr14r0rTBtswP9o1V8Q+G9K8U6U2navaie3J3DkhkYdGUjkGtYNRauTK7R8e/8JNr/wD0HdT/APAyT/Gui8NjWNdsNRu7jxvLp62Sg+XcX7h5cgn5Bu56fqK9m/4UJ4M9dS/8CR/8TR/woPwZ66l/4Ej/AOJrodWBkoSPEbKXXr61gmTxhNE0pI2z6g6Bfm24J3E574xjBHNWJYfEEUduR42DySuq7F1KQhc55Jz04/UV6/8A8KQ8BF3T7Ve7kOHH2xcqfQ8UH4IeAR1urz/wMX/D3FHtYhyM8fhi1+aOXHjYLLHIUw2pOFbChshs89cdMcHmnSxa5DcRxt43dw7MMxX7scBSR1YDnGMEivVrr4O/DmyuYre6vryGSVS6CS7ABA7524pp+EPw2Ejx/wBo3W5F3Ni8GMex28/QVSlfZP7hNW3Z5Jcf8JFa2c1w/jEt5cQlEcWpyO7ZxgYB68+vasL/AISbX/8AoOan/wCBcn+NfRI+Angz11L/AMCR/wDE0f8AChPBnrqX/gSP/ialVoD5JHC/BPxpr1x4wXQ7y+uL2yuYZHxO5kMTKMhgTyAemOnIr6Eeuc8KfD7w94MaWTSbV/tEo2vPM+9yv90HsPpXRv0rCpJSldGkU0tRsPU/WvnjW9Y1SPX9RRNSvFVbqUBVnYADceOtfQ0X8X1rGm8LeGbu5mll0qzkmZi8jFOSe5NQUeCf23q3/QUvf/Ah/wDGrVlql9O0guNevbcKuVJnc59ute1nwl4TWIynSLIIDgkoeKeng3wtJGrpotmyMMghOooEeMm7vk8tn8UTlGYqSk7kjjOcZ9cfnUd1f6hBEzx+Jp5iDgIs8mSPXrivbP8AhCfDP/QEs/8Avij/AIQrwz/0BLP/AL4oA8F/tzVv+gpff+BD/wCNH9t6t/0FL3/wIf8Axr3r/hCvDP8A0BLP/vimv4N8LxrltFtACccRk8/hTAf4Wlkm8FaZLK7SSNaKWdjkk47mr0VTxRQQaesVrGkdukeI0QYAXHGBUEdSxlhacKYKeKQx1LSCloAKKKKACiiigAooooAKKKKAA9KxtelEWkhiMjz8foa2T0rL1iD7Tpmz/ptn9DQBxz6nbp98AfWq765YoMtjFakmgpKMEVnT+CoJs43KfY0ahoU28UaOOGK/lU+m30F67S25HlE/Lisu8+G7ygmGfB7bhS6L4b8Q6ETFJaC6t85V4HGR9VOP0pagzt7M/Mv0rcgOQO1cxYSzo2JbG8jI7tA2P0zW9bXgI4guePWBh/MVZBoTf8eswxnKNx68V51fak1jot0htI4IRHhiq4wK9BWWWTKiBkXH3nIH6U2Szt3iZHiVlIwQR1oKOL0G4SWNXjdWjZQQQeorpopVGOPoaiHhjRQ++OxjiY85hJj/APQSKspo9ogAVrkD0+0yf40CsXIpAFyxx9abOokYMDnjtUa6ZZggmJnI6eY7P/M1NLhSFAAAHQUAV/LxSbKkJpM0hmhYjFsPqas1Xsv+PcfU1YpgFFFFAFSbStPuX3z2VvIxOcvGDTV0fTV3bbG3G4YOIwMjj/AVdoouBSl0jTp4ljlsoHRVVAGQHCryB9KamiaWkKQrYW/locgbAefWr9FUpyWzE4p7oKDRQakYhqNulPNNYcUARRkbmHelEMYLHZ94YP8AWo5FzVdt394/nQBbFvH5bxkFlf725ic8YqQDAAAwBWad+D8zfnVZmk85h5j9B/EaANuisTdL/wA9H/76NG6X/no//fRpXA26ZLEkyhXBIBz1xWRuk/56P/30aUGT++//AH0aLgacgSG22DgBdqjNV4xUKgnkkn61YQUDJRThSCnCkAtLSUtABRRRQAUUUUAFFFFABRRRQAHpVW4/49f+2lWj0qrcA/ZD7Sc0wKYFOApoNLmgQ4U8VHmnA0ASg1IpqEGpFpgS54qJj8pp/aom+4aAEBp4NRA04GgCYVDP98fSpVNQz/eH0oAjpKKKQGlZf8e4+pqzVay/49x9TVmmAUUUUAFFFFABRRRQAUGiigBKaadSUARMuaiKVZIphFAFUx8GqzR/v2+grRYcGqzL+/f6D+VJjIPLpPLqxto20gIPLpwSpttLtoAjVKlUUAU4CgBRThSCnUAFLRRQAUUUUAFFFFABRRRQAUUUUAFMIZSSuCD1BHBp9FAEO1f+feOjav8Az7x1NRQBDhf+feOjA/5946mooAi4/wCeEdGf+mCfnUtFAEe4/wDPFPzpM/8ATBPzqWigCLj/AJ4R0f8AbBPzqWigCLP/AExT86Dg9YIzUtFAEOB/z7x0YH/PvHU1FADFdkGFiUD0BNL50n/PMf8AfRp1FADfOk/55j/vo0edJ/zzH/fRp1FADfOk/wCeY/76NHmyf88x/wB9GnUUAN86T/nmP++jR50n/PMf99GnUUAN86T/AJ5j/vo0edJ/zzH/AH0adRQA3zZP+ea/99Gk82T/AJ5r/wB9Gn0UAR+bJ/zzX/vo0nmyf881/wC+jUlJigCPzJD0RR75JpFTGcnJPU1LijFADMUYp2KMUANxRinYpcUANxSgUuKWgAoopaACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAoxRRQAmKMUtFACUUtFACYoxS0UAJiloooAKKKKACiiigAooooAKKKKAP//Z">
Assistant:
--------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 427, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1323, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1067, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 336, in async_iteration
return await iterator.__anext__()
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 329, in __anext__
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 312, in run_sync_iterator_async
return next(iterator)
File "/content/text-generation-webui/modules/chat.py", line 303, in generate_chat_reply_wrapper
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True)):
File "/content/text-generation-webui/modules/chat.py", line 288, in generate_chat_reply
for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message):
File "/content/text-generation-webui/modules/chat.py", line 212, in chatbot_wrapper
for j, reply in enumerate(generate_reply(prompt + cumulative_reply, state, stopping_strings=stopping_strings, is_chat=True)):
File "/content/text-generation-webui/modules/text_generation.py", line 28, in generate_reply
for result in _generate_reply(*args, **kwargs):
File "/content/text-generation-webui/modules/text_generation.py", line 211, in _generate_reply
for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
File "/content/text-generation-webui/modules/text_generation.py", line 252, in generate_reply_HF
question, input_ids, inputs_embeds = apply_extensions('tokenizer', state, question, input_ids, None)
File "/content/text-generation-webui/modules/extensions.py", line 207, in apply_extensions
return EXTENSION_MAP[typ](*args, **kwargs)
File "/content/text-generation-webui/modules/extensions.py", line 108, in _apply_tokenizer_extensions
prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds)
File "/content/text-generation-webui/extensions/multimodal/script.py", line 89, in tokenizer_modifier
prompt, input_ids, input_embeds, total_embedded = multimodal_embedder.forward(prompt, state, params)
File "/content/text-generation-webui/extensions/multimodal/multimodal_embedder.py", line 172, in forward
prompt_parts = self._embed(prompt_parts)
File "/content/text-generation-webui/extensions/multimodal/multimodal_embedder.py", line 148, in _embed
embedded = self.pipeline.embed_images([parts[i].image for i in image_indicies])
File "/content/text-generation-webui/extensions/multimodal/pipelines/llava/llava.py", line 75, in embed_images
image_forward_outs = self.vision_tower(images, output_hidden_states=True)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py", line 941, in forward
return self.vision_model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py", line 866, in forward
hidden_states = self.embeddings(pixel_values)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py", line 195, in forward
patch_embeds = self.patch_embedding(pixel_values) # shape = [*, width, grid, grid]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: weight should have at least three dimensions
What could be the issue?
@AlvinKimata, My apology! I copied a wrong line to launch the server. The loader exllama_hf throws the error you got. Loading with exllama (without_hf) should work. --loader exllama
python server.py --verbose --share --chat --model-dir models --model liuhaotian_llava-llama-2-13b-chat-lightning-preview --loader exllama --max_seq_len 4096 --xformers --no-stream --deepspeed --multimodal-pipeline llava-13b
@haotian-liu Here's my Colab notebook that you can just run.
https://colab.research.google.com/drive/1n9Tq9XmmTElkRHazVKi2CUaVTqVWFjfP?usp=sharing
Hi @chigkim @AlvinKimata
I have looked into the oobabooga text-gen-ui this weekend, and I got our LLaMA-2 checkpoint working with some modifications to the code. I am working on a PR and plan to finalize it when we release our new LLaMA-2-based checkpoints in the coming days.
To try it out with the text-gen-ui, you can for now use my fork, and download my quantized checkpoints here and put it under models
folder.
I have tested the following command and it works:
python server.py \
--chat \
--model-dir models \
--model llava-llama-2-13b-chat-lightning-4bit-128g \
--multimodal-pipeline llava-llama-2-13b
Two things I may need still look into:
--loader exllama
), the multimodal plugin is not properly loaded. I have submitted an issue: https://github.com/oobabooga/text-generation-webui/issues/3378. So for now, I need to resort to AutoGPTQ
for this.GPTQ-for-LLaMa
is not compatible with AutoGPTQ
, and it generates garbled responses. I tried some other checkpoints that are quantized by TheBloke, and some of them also generates garbled outputs (e.g. TheBloke/Llama-2-7b-Chat-GPTQ:gptq-4bit-32g-actorder_True
). As a workaround, I now quantize the checkpoints using AutoGPTQ, and the checkpoint is uploaded to HF as provided above. I am not sure about the quantization quality yet.What I may need help for:
GPTQ-for-LLaMa
be compatible with AutoGPTQ?From what I know AutoGPTQ does something smart with the data - working from the training prompts to do a high quality compression, to offer 32 bit quality at 4 bit speed. Not sure GPTQ does the same.
Running with Exllama would offer a great speed advantage, benchmarks come out at about double the speed. So that would be cool. I think it should be possible without major changes to the exllama engine and to web-generation-ui, since the visual analysis(blip/clip) is already separate from the language model engine ,right?
@haotian-liu Would you please want to share the settings for the AutoGPTQ quantisation - if any? I would like to see if I can get GPTQ-for-LLaMa working.
Hi @Don-Chad
Thank you for sharing the info and for offering the help!
Regarding exllama, it seems to be some compatibility issue with the text-gen-ui itself, as also confirmed by https://github.com/oobabooga/text-generation-webui/pull/3377#issuecomment-1658311157
For AutoGPTQ quantization I used for creating this checkpoint. I basically use the sample script from AutoGPTQ repo, which I attach here.
You need to make a few modifications to your checkpoints though to properly run the script:
Download https://huggingface.co/liuhaotian/llava-llama-2-13b-chat-lightning-preview to local Make following edits to the config.json LlavaLlamaForCausalLM -> LlamaForCausalLM "model_type": "llava" -> "llama"
Note that I am quite new to these quantization techniques, so they are probably not the best parameters to use. Please share if you have any thoughts or findings to get better quality/compatibility.
Thanks.
Whatever worth, I got the quantized model using oobabooga/GPTQ-for-LLaMa to work with AutoGPTQ loader. If you quantize using qwopqwop200/GPTQ-for-LLaMa, it doesn't work. Here's what I did.
python repositories/GPTQ-for-LLaMa/llama.py models/liuhaotian_llava-llama-2-13b-chat-lightning-preview c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors models/liuhaotian_llava-llama-2-13b-chat-lightning-preview/llava-llama-2-13b-chat-4bit-128g.safetensors
git fetch origin pull/3377/head:br3377
git checkout br3377
Hi, anyone know the python code to do inference without using the webui?
Question
I downloaded llava-llama-2-13b from: https://huggingface.co/liuhaotian/llava-llama-2-13b-chat-lightning-preview
Then I've quantized the model to 4-bit using .
Then I loaded with --multimodal-pipeline llava-13b.
python server.py --verbose --share --chat --model-dir models --model liuhaotian_llava-llama-2-13b-chat-lightning-preview --loader exllama --max_seq_len 4096 --xformers --no-stream --deepspeed --multimodal-pipeline llava-13b
I tried many different images, but the descriptions are completely incorrect. I.E. I submitted a picture a dog, and it said woman holding an umbrella.
Since the model loads and outputs proper English, I assume it's quantized correctly.
I tried --load-4bit with model_worker, but it doesn't seem to support GPTQ format.
I'd appreciate any tip!
Thanks,