surcyf123 / text_generation

0 stars 0 forks source link

Evaluate models #11

Open betogaona7 opened 10 months ago

betogaona7 commented 10 months ago

Run the evaluation and get final scores (500 prompts).

betogaona7 commented 10 months ago

open_llama_3b_4bit_128g group size should be 128, however the infeatures of the model are 8640, and auto_gpt/nn_modules/qlinear/qlinear_exllama.py expects the assert infeatures % self.group_size == 0. So, to make it work, you need to comment that assert, still the answer is nothing good which indicates a failure creating the model

"answer": "\n Demonstrate a potential experiment while utilizing and enumerating the scientific method clearly and explain every step for a potential theory of the following context.\n ### USER: tell the best marketing plan to get peoples emails\n <\\s> \n\n ASSISTANT:CTOp win celebratedichCT BorisCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTatzCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTctCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTåCTsCTsCTsCTsCTsCTsCTsCTsCTsCTsCTsCTsCTsCTsCTsCTsss\ns\ns\ns\ns\ns\ns\ns\ns\ns\ns\ns\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nt\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"

betogaona7 commented 10 months ago

vicuna-7B-GPTQ-4bit-128g can' process the queries correctly, the answers seems corrupted without any sense. Example:

"answer": "\n Demonstrate a potential experiment while utilizing and enumerating the scientific method clearly and explain every step for a potential theory of the following context.\n ### USER: The preceding message discusses the concept of photonic band gaps in dielectric structures and their potential applications. It highlights the challenges in calculating the spectrum of electromagnetic waves in three-dimensional dielectric lattices, particularly when the dielectric function changes discontinuously or has a large imaginary part. The traditional plane-wave method is found to be numerically unstable in such cases. A question that arises is: What are the limitations of using the plane-wave method for calculating the spectrum of electromagnetic waves in dielectric lattices? The limitations include numerical instability near spatial discontinuities and impracticality when dealing with large imaginary parts of the dielectric constant.\n\nAsk a single relevant and insightful question about the preceding context\n. Do not try to return an answer or a summary:\n <\\s> \n\n ASSISTANT:_novtomamerâarkado domainlognovnovnovnovnoves_ningnoves_ovnovnoves_oves_oves_oves_oves_oves_oves_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ Mark_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov_ov__________________________________________________________________________________________________ Rub________________________________________________________________________________________________________________________________________________________tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom__\\tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tom_tomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtomtom"

betogaona7 commented 10 months ago

Running a small run (latest results here). using 20 prompts for 13B and 5 prompts for 30B models:

betogaona7 commented 10 months ago

Failed relevance score example:

betogaona7 commented 10 months ago

Impact of the system prompt:

System prompt: Demonstrate a potential experiment while utilizing and enumerating the scientific method clearly and explain every step for a potential theory of the following context.

betogaona7 commented 10 months ago

LLM limitation example:

betogaona7 commented 10 months ago

Answer with auto-chat example: