h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.16k stars 1.23k forks source link

eval #401

Open pseudotensor opened 1 year ago

pseudotensor commented 1 year ago
Can I propose a few more in style of that youtuber, but not same ones?
1) Coding: Write the game of "pong" in python.
2) Integration: Write a poem about H2O.ai, Wells Fargo, and NVIDIA that highlights their business relationship.
3) Facts: Who was the vice president in 1975?
4) Planning: If we lay 10 pants out in the sun and it takes 5 hours to dry, how long would 30 pants take to dry?
5) Reasoning: Bob is faster than Sammy.  Sammy is faster than Bill.  Is Bill faster than Bob?
6) Easy Math: 107 - 6 * 3 + 3 = ?
7) Steps Math: Factor: 3x^4y^3 - 48y^3.
8) Planning: How many words are in your response to this question?
9) Logic: Bowling pins have numbers 13, 57, 8, 39, 48, 24, 47.  Which pins must be knocked over to score exactly 100 points?
10) Planning: What is least number of months that add up to 120 Days?
10b) Planning: What consecutive months add up to 120 Days?
11) Logic: Look at this series: 36, 34, 30, 28, …, 22 What number should come to fill in the blank space
12) CA Bar Q: In response to a significant rise in diabetes among school-age children, and based upon
links between diabetes, exercise and diet, Congress has passed, and the President has
signed, the Childhood Physical Education Act (the Act). The Act, administered by the
Federal Department of Education, provides significant additional funds to states for public
schools with daily physical education classes for students. These funds are to be used
for the hiring of additional physical education teachers and purchase of physical education
equipment.
Testimony before Congress has revealed that, on average, public schools spend only
25% of their school lunch budgets on fresh fruits and vegetables. The Act requires that
states accepting the funds must enact legislation setting as a minimum that 50% of public
school lunch food budgets be allocated to the purchase of fresh fruits and vegetables.
Testimony has also revealed that rates of childhood diabetes tend to be highest in minority
and low-income communities. The Act has significant additional subsidies for public
schools where the majority of the student population is non-Caucasian.
Before the Act has gone into effect, State X, through its attorney general, has brought suit
in federal court seeking a declaratory judgment that the Act is unconstitutional. The
National Association of School Dieticians (NASD) is seeking to intervene in the attorney
general's lawsuit. According to NASD's charter, it seeks to promote healthy diets for
school-age children, especially through school lunch programs. The attorney general
opposes NASD's intervention.
1. What constitutional challenges can the attorney general make to the Act and are they
likely to succeed? Discuss.
2. Does NASD have standing to intervene? Discuss
Answers for some:
3) Nelson Rockefeller
4) Answer is conditional.  Simplest answer is 5 hours, since can lay them out to not block each other.  Another answer is 3*5=15.  Tests vague critical thinking.
7) y^3(3(x^2 + 4)(x + 2)(x - 2)) .  Tests understanding of latex as well.
9) 13, 39, and 48.
10) 4 months
10b) Feb+March+April+May = 28+31+30+31=121 in a non-leap year.
11) 24
pseudotensor commented 11 months ago

https://github.com/explodinggradients/ragas eval

pseudotensor commented 11 months ago

https://www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper

pseudotensor commented 9 months ago

https://arxiv.org/abs/2204.04991 TRUE

https://arxiv.org/abs/2111.09525 SummaC

https://github.com/vectara/hallucination-leaderboard https://huggingface.co/vectara/hallucination_evaluation_model

RAGAS: https://github.com/h2oai/h2ogpt/issues/1007

llama index: https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83

pseudotensor commented 8 months ago

eval or interpret:

https://github.com/confident-ai/deepeval

https://github.com/jalammar/ecco

hallucinations:

https://huggingface.co/vectara/hallucination_evaluation_model https://vectara.com/cut-the-bull-detecting-hallucinations-in-large-language-models/ image

pseudotensor commented 8 months ago

https://github.com/cdpierse/transformers-interpret

pseudotensor commented 8 months ago

https://github.com/adlnlp/pdfvqa/tree/main