Evaluation, benchmark, and scorecard, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety, and hallucination
Apache License 2.0
22
stars
40
forks
source link
Restructure genAI Eval to address evaluation of multiple categories of metrics #75
Restructure the genAI_Eval to categorize the various Evaluation criteria. Need to have different Repos for each category so it makes it easier to users to find what they are looking for.
Example: All code relating to perf and BKMS /scripts into the perf section
Restructure the genAI_Eval to categorize the various Evaluation criteria. Need to have different Repos for each category so it makes it easier to users to find what they are looking for.
Example: All code relating to perf and BKMS /scripts into the perf section
Performance Trustworthiness Scalability Safety Security