jcassady / llava-benchmark

LLaVA Bechmark evaluates image & audio processing capabilities of AI models with Ollama.
https://jordan.cassady.me
MIT License
0 stars 0 forks source link

Implement model accuracy Pytest for LicensePlateBenchmark class #7

Open jcassady opened 4 months ago

jcassady commented 4 months ago

implement a pytest focused on model accuracy for the LicensePlateBenchmark class. The test should use known license plate numbers as the source of truth for evaluating the models for their license plate reading capabilities. The test should be implemented in a new file called test_license_plate_accuracy.py.

The test case should validate the accuracy of the default license plate images under data/images in the project llava-benchmark directory.

Tasks:

  1. Create a new test file test_license_plate_accuracy.py.
  2. In this file, import the necessary modules and classes, including pytest, LicensePlateBenchmark, and Ollama.
  3. Define a dictionary known_license_plates with image file names as keys and corresponding known license plate numbers as values.
  4. Write a test function test_license_plate_accuracy() that does the following:
    • Mocks the Ollama.run_benchmark() method to return known license plate numbers.
    • Simulates storing the results for each model, prompt, and image.
    • Checks that the license plate numbers were stored correctly.
    • Ensure the accuracy of the Ollama read license plate using the AI model matches the known_license_plate data.

Additional Information: Here is the structure of the known_license_plates dictionary:

# Known license plate numbers for testing
known_license_plates = {
    "1.jpg": "XFC77R",
    "2.jpg": "PAX 44",
    "3.jpg": "OPEC LOL",
    "4.jpg": "LYF1ZGD",
    "5.jpg": "YJX-4976",
    "6.jpg": "F1",
    "7.jpg": "REAP3R",
    "8.jpg": "CRAIG",
    "9.jpg": "OMG MOOV",
    "10.jpg": "1496BH"
}