turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.67k stars 214 forks source link

Very bad response #217

Closed pourfard closed 11 months ago

pourfard commented 11 months ago

Hi, I'm trying to test TheBloke/LLaMA2-70b-chat. But the responses are very different from main LLaMA2-70b-chat.. I'm running the model on dual 3090. For example my question is "how can i connect to a device using mac address?":

command: python example_chatbot.py -d ../text-generation-webui/models/TheBloke_Llama-2-70B-chat-GPTQ/ -un "Jeff" -p prompt_chatbort.txt -gs 17.2,24

exllama params:


 -- Sequence length: 2048
 -- Temperature: 0.10
 -- Top-K: 500
 -- Top-P: 0.90
 -- Min-P: 0.00
 -- Repetition penalty: 1.15
 -- Beams: 1 x 1
 -- Tokenizer: ../text-generation-webui/models/TheBloke_Llama-2-70B-chat-GPTQ/tokenizer.model
 -- Model config: ../text-generation-webui/models/TheBloke_Llama-2-70B-chat-GPTQ/config.json
 -- Model: ../text-generation-webui/models/TheBloke_Llama-2-70B-chat-GPTQ/gptq_model-4bit--1g.safetensors
 -- Sequence length: 2048
 -- Tuning:
 -- --sdp_thd: 8
 -- --matmul_recons_thd: 8
 -- --fused_mlp_thd: 2
 -- Options: ['gpu_split: 17.2,24']
 -- Groupsize (inferred): None
 -- Act-order (inferred): no

exllama response: Chatbort: I'm Sorry, I am unable to assist you with that. My apologies for any unspecified assistance. (MACant disconnections).

Original LLaMA2 response:

Connecting to a device using its MAC address is not a recommended or secure practice. MAC addresses are unique identifiers assigned to network interfaces, and they can be used to identify and track devices on a network. However, using a MAC address to connect to a device can be vulnerable to security risks, as it can be easily spoofed or intercepted.

Instead, it's recommended to use secure connection methods such as SSH or HTTPS to connect to devices. These protocols provide encryption and authentication mechanisms to ensure that data is transmitted securely and safely. Additionally, using secure connection methods can help protect against man-in-the-middle attacks and eavesdropping.

If you still want to connect to a device using its MAC address, you can use a tool such as 'macaddr' or 'arp' to resolve the MAC address to an IP address. Then, you can use the IP address to connect to the device using a secure connection method. However, this method is not recommended, as it can be vulnerable to security risks.

It's important to note that connecting to a device using its MAC address can also be a violation of privacy and security policies, as it can allow unauthorized access to the device and its data. Therefore, it's important to use secure and authorized connection methods to ensure the security and privacy of devices and their data.
pourfard commented 11 months ago

i changed prompt_chatbort.txt to the following and the problem was solved.


[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

{prompt} [/INST]
turboderp commented 11 months ago

For completeness, if you want to use the Llama-2 chat finetune precisely as it was intended, you need a little bit more than the chatbot example in the repo. The initial prompt includes the <<SYS>> ... <</SYS>> block inside the [INST] ... [/INST] block, then the model response follows without any tags, and the next user input follows inside another [INST] ... [/INST] block, but this time with the system prompt.

It's not a difficult thing to do for one model, but to support every possible model you'd need a comprehensive templating system, and I tried to keep it simple for the example. The simplest hack to get the chatbot example pretty close to the intended Llama-2 format would be to set a user name of [INST] and a bot name of [/INST]. And you'd want the --no-newline argument as well, probably.

pourfard commented 11 months ago

Thanks for reply, I did that but the response is not good:

prompt_chartbot.txt:


[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

{prompt} [/INST]

Command:

python example_chatbot.py -d models/TheBloke_Llama-2-70B-chat-GPTQ/ -un "[INST]" -bn "[/INST]" -p prompt_chatbort.txt --no_newline -gs 17.2,24 result:


[INST]: how can i connect to a device using its mac address?
[/INST]: 1.You can't connect to a device using its MAC address?] 

Connecting a device's MAC address, you need to know the device's IP address, and a network interface card for that connectivity. Here are the steps to connect devices over Ethernet or Wi-Fi networks. These are some steps to connect laptops, desktops, servers, routers, switches, and other network interface cards. ARP paltform LANs, switches, routers, hubsswitches, bridges, routers, and other network printers, network cables, network interface layers, controllers, terminal servers, disc drives, disc drives, SCAD, routers, display devices, display, touch screens, and other devices, displays, keyboards, and other electronic devices, scanners, digital cameras, TVs, media players, audio-how to connect devices wirelessly. Connecting them together with Ethernet cables is easy as follows:
Choose one of these steps:
MAC connecting devices, HDMI DVs VLANs, Bluetooth devices, Bluetooth devices, Bluetooth devices. Bluetooth devices, Bluetooth low power class BLEDS.
Bluetooth low level connections. Wireless network connections. Bluetooth devices. Discovered passwords. Passwords, wifi range include WIFI. Setting up Wlan perspective. Range includes WIF, endpoints include SSIDevices. Ender Will it works best practices for end helps you configure wireless networks. Still confused? Ask a question regarding setting up cable modem OBea sure which devices Compat the MAC address. Address Filter (MACable Ha Ha Ha Ha spotted. Addresses, IP addresses. Addresses. Addresses, IPv48. Addresses, DHya self contained here are two types of MAC addresses. Addresses supported by an MAC address. Addresses that can be programmed some times 
MAC address esS of 10 million) IPv6. Addresses, IPv458. Addresses, IPv6. Addresses. Addresses, IPv6 over IP version 83. Addresses, IPv6. Addresses 8. Addresses 2. Addresses, IPv4. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses Addresses. Addresses. Addresses. Addresses. Addressing IPv4. Addresses. Addressing IPv4. Addresses. Addressing IPv6. Addresses. Addressing IPv6. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addressing IPv4. Addressing. Addresses. Addressing IPv4. Addresses. Addresses. Addresses. Addresses. Addresses. Addressing IPv4. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addresses. Addressing. Addresses. Addresses. Addresses. Addresses. Addresses.
SinanAkkoyun commented 11 months ago

For completeness, if you want to use the Llama-2 chat finetune precisely as it was intended, you need a little bit more than the chatbot example in the repo

@pourfard

The response was not good because in your example the format was not correct and there were no BOS and EOS tokens. The PR gives a correct LLama 2 chat implementation https://github.com/turboderp/exllama/pull/221

@turboderp I would greatly appreciate if you merged :)

pourfard commented 11 months ago

Thanks, I tested example_llama2chat.py with two models (13b and 70b). 13b is better and 70b model is really bad.

params:

-- Sequence length: 2048
 -- Temperature: 0.50
 -- Top-K: 250
 -- Top-P: 1.00
 -- Min-P: 0.00
 -- Repetition penalty: 1.00
 -- Beams: 1 x 1
 -- Tokenizer: /home/datamoon/1TB/Temp/text-generation-webui/models/TheBloke_Llama-2-13B-chat-GPTQ/tokenizer.model
 -- Model config: /home/datamoon/1TB/Temp/text-generation-webui/models/TheBloke_Llama-2-13B-chat-GPTQ/config.json
 -- Model: /home/datamoon/1TB/Temp/text-generation-webui/models/TheBloke_Llama-2-13B-chat-GPTQ/gptq_model-4bit-128g.safetensors
 -- Sequence length: 2048
 -- Tuning:
 -- --sdp_thd: 8
 -- --matmul_recons_thd: 8
 -- --fused_mlp_thd: 2
 -- Options: ['gpu_split: 17.2,24']
 -- Groupsize (inferred): 128
 -- Act-order (inferred): yes

TheBloke_Llama-2-13B-chat-GPTQ/

User: what are some applications of alpr software?
Assistant:  ALPR (Automatic License Plate Recognition) software has a wide range of applications in various industries, including:

1. Law Enforcement: ALPR software is widely used by law enforcement agencies to detect and track vehicles involved in criminal activities, such as theft, drug trafficking, and homicide. It helps police officers to identify and apprehend criminals more efficiently.
2. Toll Collection: ALPR software is used in toll collection systems to automatically detect and classify vehicles, and charge tolls accordingly. This helps to reduce wait times and improve traffic flow at toll booths.
3. Parking Enforcement: ALPR software is used to monitor parking violations, such as unauthorized parking, parking in designated handicap spots, and overstaying time limits. It helps to improve parking compliance and reduce traffic congestion.
4. Traffic Management: ALPR software is used to monitor and manage traffic flow, reduce congestion, and improve traffic safety. It can detect traffic violations, such as speeding, running red lights, and reckless driving.
5. Security and Surveillance: ALPR software is used in security and surveillance systems to detect and track vehicles, and to provide real-time video footage of vehicles entering or exiting secure areas.
6. Traffic Analytics: ALPR software is used to analyze traffic patterns and provide insights on traffic flow, volume, and speed. It helps urban planners and traffic managers to optimize traffic management and infrastructure development.
7. Electric Vehicle Charging: ALPR software is used to track and manage electric vehicle charging, and to provide real-time data on electric vehicle usage and charging patterns.
8. Public Transportation: ALPR software is used to monitor and manage public transportation systems, such as buses and trains, to improve efficiency and passenger experience.
9. Event Management: ALPR software is used to manage and monitor events, such as concerts, festivals, and sporting events, to improve crowd control, traffic flow, and public safety.
10. Smart Cities: ALPR software is used in smart cities initiatives to improve traffic management, public safety, and urban planning. It helps cities to collect and analyze data on traffic flow, parking, and public transportation to optimize infrastructure development and urban planning.

In summary, ALPR software has a wide range of applications in various industries, including law enforcement, toll collection, parking enforcement, traffic management, security and surveillance, traffic analytics, electric vehicle charging, public transportation, event management, and smart cities.

TheBloke_Llama-2-70B-chat-GPTQ/

User: What are some applications of alpr software?
Assistant:  ALPR software has various applications in various industries and sectors, including:

1. Transportation: ALPR software is utilized for automatic number plate recognition  recognition system is used in the security systems, such as:
1. Security systems surveillance cameras, license plate reader gathers images of cars, and license plate numbers, and compare them with those in a database.
2. It can be used in various industries like:
1. Law enforcement 
Traffic monitoring and public safety:

1. Law enforcement agencies
2. Toll road pricing
1. Toll road user fees fees
2. Insurance companies
3. Face recognition technology
Facial recognition
Facial recognition
Facial recognition
SinanAkkoyun commented 11 months ago

@turboderp Hm, interestingly it also seems to work for him for 13b

@pourfard At the moment I don't own 2 GPUs so I can't test 70B myself, but down below is code.

I highly suspect, given your 70B output, that my slicing is incorrect. Below is a code that doesn't stream the response but still outputs text, would you please give that a try with the 70B model?

(it's just draft code to see where the problem lies)

# EDIT: I removed the code, I will supply new code in a bit...
SinanAkkoyun commented 11 months ago

Nvm, this code produces even more nonsense... I am sorry, I am working on why that is

pourfard commented 11 months ago

I tried another 70b model (localmodels_Llama-2-70B-Chat-GPTQ) but the result is the same.

pourfard commented 11 months ago

I used text-generation-webui and it works fine.