wesslen / bayesian-beagle

Bayesian beagle: Quarto blog of weekly LLM-generated summaries of Arxiv preprints
https://bayesian-beagle.netlify.app/
MIT License
6 stars 0 forks source link

Add more tests for summarizer and builder #5

Open wesslen opened 6 months ago

wesslen commented 6 months ago

However, in this answer, I'm going to focus on writing mock unit tests for these functionalities because we typically can't make network requests to external services, like arXiv or OpenAI, in our testing environments.

Here is the outline of the tests using pytest and unittest.mock:

import pytest
from unittest.mock import patch, Mock
from your_script import OpenAIAssistant, is_valid_arxiv_id, ...

# Test is_valid_arxiv_id function. Use mocking to simulate arxiv.Client behavior.
@patch("your_script.arxiv.Client")
def test_is_valid_arxiv_id(mock_client):
    # Adjust this to point to the actual location where arXiv ID check happens
    mock_client.return_value.results.return_value = iter([Mock()])
    assert is_valid_arxiv_id("valid_arxiv_id")[0] is True
    mock_client.assert_called_once()

    mock_client.reset_mock()
    mock_client.return_value.results.side_effect = StopIteration
    assert is_valid_arxiv_id("invalid_arxiv_id")[0] is False
    mock_client.assert_called_once()

# And continue to create tests for each function using similar mocking techniques, for example:
# - extract_text_from_html using a sample HTML content
# - remove_double_quotes using text known to contain double quotes
# - etc.

# For the OpenAI integration, you would mock the API call and any network-related functionalities
@patch("your_script.OpenAIAssistant.process_text")
def test_process_text(mock_process_text):
    mock_process_text.return_value = "summarized text"

    assistant = OpenAIAssistant()
    summary = assistant.process_text("A long academic text", "summarize")
    assert summary == "summarized text"
    mock_process_text.assert_called_once()

# Don't forget to include mocks for file system interactions when testing the summarize command function!

Remember, these examples are just templates—you will have to adapt them to fit the actual implementation details and to verify the expected outputs. Additionally, you should mock any network interactions to avoid performing actual HTTP requests during testing.

Please also ensure that you have installed pytest (pip install pytest) and have the necessary imports at the beginning of your test script. The ... dots in the imports comment mean you should import any other functions or classes you create tests for from your actual script.

When preparing your test environment, take care to set up any necessary mock responses and ensure that the function calls within your mock tests match the actual calls the functions make. This will mean examining the code and understanding how each unit interacts with other services.

wesslen commented 6 months ago

Here is a basic outline for unit tests you might consider:

import pytest
import json
import os
from your_script_name import create_qmd_file # Make sure to import your functions correctly
from pathlib import Path
from datetime import datetime

# Create a fixture for sample data
@pytest.fixture
def example_data():
    return {
        "meta": {
            "title": "Sample Article",
            "subtitle": "An overview of a sample article",
            "model": "Test Model",
            "publish_date": "2023-04-01",
            "url": "https://example.com/sample-article",
            "image": None,
            "categories": ["Research", "Sample"],
            "is_truncated": False,
            "word_count": 500
        },
        "text": "This is a summary of the sample article."
    }

def test_template_rendering(tmp_path, example_data):
    create_qmd_file(example_data, tmp_path)
    folder_name = example_data["meta"]["title"].replace(" ", "_")
    expected_file_name = "2023-04-01-Sample_Article.qmd"
    expected_file_path = tmp_path / folder_name / expected_file_name
    assert expected_file_path.is_file()
    with open(expected_file_path, 'r') as file:
        content = file.read()
        assert "title: \"Sample Article\"" in content
        assert "author: \"Test Model\"" in content
        assert "This is a summary of the sample article." in content

def test_output_folder_creation(tmp_path, example_data):
    create_qmd_file(example_data, tmp_path)
    folder_name = example_data["meta"]["title"].replace(" ", "_")
    folder_path = tmp_path / folder_name
    assert folder_path.is_dir()

def test_file_creation_without_image(tmp_path, example_data):
    example_data_no_image = json.loads(json.dumps(example_data)) # deep copy
    example_data_no_image["meta"]["image"] = None
    create_qmd_file(example_data_no_image, tmp_path)
    folder_name = example_data_no_image["meta"]["title"].replace(" ", "_")
    expected_file_name = "2023-04-01-Sample_Article.qmd"
    expected_file_path = tmp_path / folder_name / expected_file_name
    assert expected_file_path.is_file()
    with open(expected_file_path, 'r') as file:
        content = file.read()
        assert "../../../bayesian-beagle.png" in content

def test_file_already_exists(tmp_path, example_data):
    create_qmd_file(example_data, tmp_path)
    folder_name = example_data["meta"]["title"].replace(" ", "_")
    expected_file_name = "2023-04-01-Sample_Article.qmd"
    expected_file_path = tmp_path / folder_name / expected_file_name
    assert expected_file_path.is_file()

    # Check if the logging message is correct when the file already exists
    with pytest.raises(FileExistsError):
        create_qmd_file(example_data, tmp_path)

# Additional tests can be added here...

# Execute the tests with pytest from the terminal:
# pytest your_test_script_name.py

Here's what changed transitioning to pytest:

Remember to replace your_test_script_name.py with the actual file name of your tests, and replace your_script_name with the actual name of the module (Python script) that contains the create_qmd_file function.

Please note that pytest doesn't require a class-based structure for tests. This can lead to much simpler code, especially for more straightforward test cases.

wesslen commented 6 months ago