Qustion about SFT - Githubissues

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

MIT License

2.29k stars 172 forks source link

Qustion about SFT #112

Open ccccj opened 9 months ago

ccccj commented 9 months ago

Hello, I have a model of llama that has been fine-tuned using my own dataset. Is it possible for me to compress the fine-tuned llama model and will the accuracy drop due to my fine-tuning?

ccccj commented 9 months ago

I mean, is it possible that I perform AWQ search again and will get better performance?

Hongbosherlock commented 9 months ago

I mean, is it possible that I perform AWQ search again and will get better performance?

hi, I have similar issue. Do you have any progress?

ccccj commented 9 months ago

Hello, I have a model of llama that has been fine-tuned using my own dataset. Is it possible for me to compress the fine-tuned llama model and will the accuracy drop due to my fine-tuning?

No, I used another method of compressing the model(SpQR), so I didn't use this one

Hongbosherlock commented 9 months ago

Hello, I have a model of llama that has been fine-tuned using my own dataset. Is it possible for me to compress the fine-tuned llama model and will the accuracy drop due to my fine-tuning?

No, I used another method of compressing the model(SpQR), so I didn't use this one

Did SpQR achieve good results ? btw, I heard that SpQR doesn't have the corresponding CUDA kernel implemented.

ccccj commented 9 months ago

Hello, I have a model of llama that has been fine-tuned using my own dataset. Is it possible for me to compress the fine-tuned llama model and will the accuracy drop due to my fine-tuning?

No, I used another method of compressing the model(SpQR), so I didn't use this one

Did SpQR achieve good results ? btw, I heard that SpQR doesn't have the corresponding CUDA kernel implemented.

spqr looks ok from the score (my application scenario doesn't require the implementation of a cuda kernel). Maybe we can communicate by e-mail or WeChat.

Hongbosherlock commented 9 months ago

Hello, I have a model of llama that has been fine-tuned using my own dataset. Is it possible for me to compress the fine-tuned llama model and will the accuracy drop due to my fine-tuning?

No, I used another method of compressing the model(SpQR), so I didn't use this one

Did SpQR achieve good results ? btw, I heard that SpQR doesn't have the corresponding CUDA kernel implemented.

spqr looks ok from the score (my application scenario doesn't require the implementation of a cuda kernel). Maybe we can communicate by e-mail or WeChat.

Sure, My WeChat is all lowercase words of my github nickname.