superduper-io / superduper

Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.
https://superduper.io
Apache License 2.0
4.7k stars 458 forks source link

Key pytest fixtures for setting up unittests #1170

Closed blythed closed 11 months ago

blythed commented 1 year ago

We need fixtures which provide monolithic setups for

Goal is to refactor all of the fixtures for the project to be:

IDEA

One single fixture which sets up Datalayer with:

jieguangzhou commented 12 months ago

We get failed test case messages when use mongomock like below

test/integration/test_atlas.py::test_setup_atlas_vector_search SKIPPED (Only atlas deployments relevant.)                                [  0%]
test/integration/test_atlas.py::test_use_atlas_vector_search SKIPPED (Only atlas deployments relevant.)                                  [  1%]
test/integration/test_cdc.py::test_smoke PASSED                                                                                          [  1%]
test/integration/test_cdc.py::test_task_workflow[insert] FAILED                                                                          [  2%]
test/integration/test_cdc.py::test_vector_database_sync_with_delete FAILED                                                               [  3%]
test/integration/test_cdc.py::test_vector_database_sync FAILED                                                                           [  3%]
test/integration/test_cdc.py::test_single_insert FAILED                                                                                  [  4%]
test/integration/test_cdc.py::test_many_insert FAILED                                                                                    [  5%]
test/integration/test_cdc.py::test_delete_one FAILED                                                                                     [  5%]
test/integration/test_cdc.py::test_single_update FAILED                                                                                  [  6%]
test/integration/test_cdc.py::test_many_update FAILED                                                                                    [  6%]
test/integration/test_cdc.py::test_insert_without_cdc_handler PASSED                                                                     [  7%]
test/integration/test_cdc.py::test_cdc_stop PASSED                                                                                       [  8%]
test/integration/test_dask.py::test_taskgraph_futures_with_dask FAILED                                                                   [  8%]
test/integration/test_dask.py::test_insert_with_dask FAILED                                                                              [  9%]
test/integration/test_dask.py::test_dependencies_with_dask FAILED                                                                        [ 10%]
test/integration/test_ibis.py::test_nested_query PASSED                                                                                  [ 10%]
test/integration/test_ibis.py::test_end2end_sql PASSED                                                                                   [ 11%]
test/integration/test_ibis.py::test_end2end_duckdb PASSED                                                                                [ 11%]
test/integration/test_ibis.py::test_end2end_pandas PASSED                                                                                [ 12%]
test/integration/test_notebooks.py::test_notebooks[notebooks/mnist_clean.ipynb] SKIPPED (Notebook tests are disabled)                    [ 13%]
test/integration/test_server.py::test_add_load PASSED                                                                                    [ 13%]
test/integration/test_server.py::test_show FAILED                                                                                        [ 14%]
test/integration/test_server.py::test_insert PASSED                                                                                      [ 15%]
test/integration/test_server.py::test_remove FAILED                                                                                      [ 15%]
test/integration/test_server.py::test_update PASSED                                                                                      [ 16%]
test/integration/ext/anthropic/test_model_anthropic.py::test_completions SKIPPED (API is not publically available yet)                   [ 16%]
test/integration/ext/anthropic/test_model_anthropic.py::test_batch_completions SKIPPED (API is not publically available yet)             [ 17%]
test/integration/ext/anthropic/test_model_anthropic.py::test_completions_async SKIPPED (API is not publically available yet)             [ 18%]
test/integration/ext/anthropic/test_model_anthropic.py::test_batch_completions_async SKIPPED (API is not publically available yet)       [ 18%]
test/integration/ext/cohere/test_model_cohere.py::test_embed_one PASSED                                                                  [ 19%]
test/integration/ext/cohere/test_model_cohere.py::test_embed_batch PASSED                                                                [ 20%]
test/integration/ext/cohere/test_model_cohere.py::test_async_embed_one PASSED                                                            [ 20%]
test/integration/ext/cohere/test_model_cohere.py::test_async_embed_batch PASSED                                                          [ 21%]
test/integration/ext/cohere/test_model_cohere.py::test_generate PASSED                                                                   [ 21%]
test/integration/ext/cohere/test_model_cohere.py::test_batch_generate PASSED                                                             [ 22%]
test/integration/ext/cohere/test_model_cohere.py::test_chat_async PASSED                                                                 [ 23%]
test/integration/ext/cohere/test_model_cohere.py::test_batch_chat_async PASSED                                                           [ 23%]
test/integration/ext/openai/test_model_openai.py::test_embed PASSED                                                                      [ 24%]
test/integration/ext/openai/test_model_openai.py::test_batch_embed PASSED                                                                [ 25%]
test/integration/ext/openai/test_model_openai.py::test_embed_async PASSED                                                                [ 25%]
test/integration/ext/openai/test_model_openai.py::test_batch_embed_async PASSED                                                          [ 26%]
test/integration/ext/openai/test_model_openai.py::test_chat PASSED                                                                       [ 26%]
test/integration/ext/openai/test_model_openai.py::test_batch_chat PASSED                                                                 [ 27%]
test/integration/ext/openai/test_model_openai.py::test_chat_async PASSED                                                                 [ 28%]
test/integration/ext/openai/test_model_openai.py::test_batch_chat_async PASSED                                                           [ 28%]
test/integration/ext/openai/test_model_openai.py::test_create_url PASSED                                                                 [ 29%]
test/integration/ext/openai/test_model_openai.py::test_create_url_batch PASSED                                                           [ 30%]
test/integration/ext/openai/test_model_openai.py::test_create_async PASSED                                                               [ 30%]
test/integration/ext/openai/test_model_openai.py::test_create_url_async PASSED                                                           [ 31%]
test/integration/ext/openai/test_model_openai.py::test_create_url_async_batch PASSED                                                     [ 31%]
test/integration/ext/openai/test_model_openai.py::test_edit_url PASSED                                                                   [ 32%]
test/integration/ext/openai/test_model_openai.py::test_edit_url_batch PASSED                                                             [ 33%]
test/integration/ext/openai/test_model_openai.py::test_edit_async PASSED                                                                 [ 33%]
test/integration/ext/openai/test_model_openai.py::test_edit_url_async PASSED                                                             [ 34%]
test/integration/ext/openai/test_model_openai.py::test_edit_url_async_batch PASSED                                                       [ 35%]
test/integration/ext/openai/test_model_openai.py::test_transcribe PASSED                                                                 [ 35%]
test/integration/ext/openai/test_model_openai.py::test_batch_transcribe PASSED                                                           [ 36%]
test/integration/ext/openai/test_model_openai.py::test_transcribe_async PASSED                                                           [ 36%]
test/integration/ext/openai/test_model_openai.py::test_batch_transcribe_async PASSED                                                     [ 37%]
test/integration/ext/openai/test_model_openai.py::test_translate PASSED                                                                  [ 38%]
test/integration/ext/openai/test_model_openai.py::test_batch_translate PASSED                                                            [ 38%]
test/integration/ext/openai/test_model_openai.py::test_translate_async PASSED                                                            [ 39%]
test/integration/ext/openai/test_model_openai.py::test_batch_translate_async PASSED                                                      [ 40%]
test/unittest/test_impall.py::ImpAllTest::test_all PASSED                                                                                [ 40%]
test/unittest/test_quality.py::test_quality PASSED                                                                                       [ 41%]
test/unittest/base/test_config.py::test_unknown_name PASSED                                                                              [ 41%]
test/unittest/base/test_config.py::test_dict_names PASSED                                                                                [ 42%]
test/unittest/base/test_config.py::test_config_has_no_dupes PASSED                                                                       [ 43%]
test/unittest/base/test_config.py::test_find_dupes PASSED                                                                                [ 43%]
test/unittest/base/test_config_dicts.py::test_combine_config_dicts PASSED                                                                [ 44%]
test/unittest/base/test_config_dicts.py::test_environ_dict PASSED                                                                        [ 45%]
test/unittest/base/test_config_dicts.py::test_split_address[-expected0] PASSED                                                           [ 45%]
test/unittest/base/test_config_dicts.py::test_split_address[re-expected1] PASSED                                                         [ 46%]
test/unittest/base/test_config_dicts.py::test_split_address[red-expected2] PASSED                                                        [ 46%]
test/unittest/base/test_config_dicts.py::test_split_address[blue_green-expected3] PASSED                                                 [ 47%]
test/unittest/base/test_config_dicts.py::test_split_address[blue_green_orange-expected4] PASSED                                          [ 48%]
test/unittest/base/test_config_dicts.py::test_split_address[blue_green_puce-expected5] PASSED                                            [ 48%]
test/unittest/base/test_config_dicts.py::test_environ_to_config_dict_many PASSED                                                         [ 49%]
test/unittest/base/test_config_dicts.py::test_environ_to_config_dict_single PASSED                                                       [ 50%]
test/unittest/base/test_jsonization.py::test_jsonization1 PASSED                                                                         [ 50%]
test/unittest/base/test_jsonization.py::test_jsonization2 PASSED                                                                         [ 51%]
test/unittest/cli/test_cli.py::test_cli_info PASSED                                                                                      [ 51%]
test/unittest/component/test_documents.py::test_document_encoding PASSED                                                                 [ 52%]
test/unittest/component/test_documents.py::test_document_outputs PASSED                                                                  [ 53%]
test/unittest/component/test_documents.py::test_only_uri PASSED                                                                          [ 53%]
test/unittest/component/test_model.py::test_predict PASSED                                                                               [ 54%]
test/unittest/component/test_serialization.py::test_model PASSED                                                                         [ 55%]
test/unittest/component/test_serialization.py::test_sklearn PASSED                                                                       [ 55%]
test/unittest/component/test_vector_index.py::test_ibatch PASSED                                                                         [ 56%]
test/unittest/db/base/test_downloaders.py::test_s3_and_web PASSED                                                                        [ 56%]
test/unittest/db/base/test_downloaders.py::test_file_blobs PASSED                                                                        [ 57%]
test/unittest/db/base/test_query.py::test_execute_insert_and_find PASSED                                                                 [ 58%]
test/unittest/db/base/test_query.py::test_execute_complex_query PASSED                                                                   [ 58%]
test/unittest/db/base/test_query.py::test_execute_like_queries PASSED                                                                    [ 59%]
test/unittest/db/ibis/test_query.py::test_serialize_table PASSED                                                                         [ 60%]
test/unittest/db/mongodb/test_database.py::test_create_component PASSED                                                                  [ 60%]
test/unittest/db/mongodb/test_database.py::test_update_component PASSED                                                                  [ 61%]
test/unittest/db/mongodb/test_database.py::test_compound_component PASSED                                                                [ 61%]
test/unittest/db/mongodb/test_database.py::test_select_vanilla PASSED                                                                    [ 62%]
test/unittest/db/mongodb/test_database.py::test_select PASSED                                                                            [ 63%]
test/unittest/db/mongodb/test_database.py::test_reload_dataset PASSED                                                                    [ 63%]
test/unittest/db/mongodb/test_database.py::test_insert PASSED                                                                            [ 64%]
test/unittest/db/mongodb/test_database.py::test_insert_from_uris PASSED                                                                  [ 65%]
test/unittest/db/mongodb/test_database.py::test_update PASSED                                                                            [ 65%]
test/unittest/db/mongodb/test_database.py::test_listener PASSED                                                                          [ 66%]
test/unittest/db/mongodb/test_database.py::test_predict PASSED                                                                           [ 66%]
test/unittest/db/mongodb/test_database.py::test_delete PASSED                                                                            [ 67%]
test/unittest/db/mongodb/test_database.py::test_replace PASSED                                                                           [ 68%]
test/unittest/db/mongodb/test_database.py::test_dataset PASSED                                                                           [ 68%]
test/unittest/db/mongodb/test_pymongo.py::test_find PASSED                                                                               [ 69%]
test/unittest/db/mongodb/test_queries.py::test_delete_many PASSED                                                                        [ 70%]
test/unittest/db/mongodb/test_queries.py::test_replace PASSED                                                                            [ 70%]
test/unittest/db/mongodb/test_queries.py::test_insert_from_uris PASSED                                                                   [ 71%]
test/unittest/db/mongodb/test_queries.py::test_update_many PASSED                                                                        [ 71%]
test/unittest/db/mongodb/test_queries.py::test_insert_many PASSED                                                                        [ 72%]
test/unittest/db/mongodb/test_queries.py::test_like PASSED                                                                               [ 73%]
test/unittest/db/mongodb/test_queries.py::test_insert_one PASSED                                                                         [ 73%]
test/unittest/db/mongodb/test_queries.py::test_delete_one PASSED                                                                         [ 74%]
test/unittest/db/mongodb/test_queries.py::test_find PASSED                                                                               [ 75%]
test/unittest/db/mongodb/test_queries.py::test_find_one PASSED                                                                           [ 75%]
test/unittest/db/mongodb/test_queries.py::test_aggregate PASSED                                                                          [ 76%]
test/unittest/db/mongodb/test_queries.py::test_replace_one PASSED                                                                        [ 76%]
test/unittest/db/mongodb/test_query.py::test_select_missing_outputs PASSED                                                               [ 77%]
test/unittest/db/mongodb/test_query_dataset.py::test_query_dataset FAILED                                                                [ 78%]
test/unittest/db/mongodb/test_query_dataset.py::test_query_dataset_base FAILED                                                           [ 78%]
test/unittest/db/sqlalchemy/test_metadata.py::test PASSED                                                                                [ 79%]
test/unittest/misc/test_dataclasses.py::test_dataclasses PASSED                                                                          [ 80%]
test/unittest/misc/test_dataclasses.py::test_methods PASSED                                                                              [ 80%]
test/unittest/misc/test_superduper.py::test_sklearn_typer PASSED                                                                         [ 81%]
test/unittest/misc/test_superduper.py::test_torch_typer PASSED                                                                           [ 81%]
test/unittest/misc/test_superduper.py::test_superduper_model PASSED                                                                      [ 82%]
test/unittest/misc/test_superduper.py::test_superduper_raise PASSED                                                                      [ 83%]
test/unittest/misc/runnable/test_collection.py::test_thread_queue[1] PASSED                                                              [ 83%]
test/unittest/misc/runnable/test_collection.py::test_thread_queue[3] PASSED                                                              [ 84%]
test/unittest/misc/runnable/test_collection.py::test_thread_queue_error[1] PASSED                                                        [ 85%]
test/unittest/misc/runnable/test_collection.py::test_thread_queue_error[3] PASSED                                                        [ 85%]
test/unittest/misc/runnable/test_thread.py::test_is_thread PASSED                                                                        [ 86%]
test/unittest/misc/runnable/test_thread.py::test_has_thread PASSED                                                                       [ 86%]
test/unittest/misc/tree/test_for_each.py::test_for_each_breadth PASSED                                                                   [ 87%]
test/unittest/misc/tree/test_for_each.py::test_for_each_depth PASSED                                                                     [ 88%]
test/unittest/model/test_openai.py::test_retrieve_with_similar_context PASSED                                                            [ 88%]
test/unittest/model/test_sklearn.py::TestPipeline::test_fit_predict_classic PASSED                                                       [ 89%]
test/unittest/model/test_sklearn.py::TestPipeline::test_fit_db PASSED                                                                    [ 90%]
test/unittest/model/test_torch.py::test_fit PASSED                                                                                       [ 90%]
test/unittest/model/test_torch_utils.py::test_device_of_cpu PASSED                                                                       [ 91%]
test/unittest/model/test_torch_utils.py::test_device_of_cuda PASSED                                                                      [ 91%]
test/unittest/model/test_torch_utils.py::test_eval_context_manager PASSED                                                                [ 92%]
test/unittest/model/test_torch_utils.py::test_set_device_context_manager PASSED                                                          [ 93%]
test/unittest/model/test_torch_utils.py::test_to_device_tensor PASSED                                                                    [ 93%]
test/unittest/model/test_torch_utils.py::test_to_device_nested_list PASSED                                                               [ 94%]
test/unittest/model/test_torch_utils.py::test_to_device_nested_dict PASSED                                                               [ 95%]
test/unittest/model/test_transformers.py::test_transformer_predict PASSED                                                                [ 95%]
test/unittest/model/test_transformers.py::test_tranformers_fit PASSED                                                                    [ 96%]
test/unittest/model/test_vanilla.py::test_function_predict_one PASSED                                                                    [ 96%]
test/unittest/model/test_vanilla.py::test_function_predict PASSED                                                                        [ 97%]
test/unittest/model/test_vanilla.py::test_function_predict_with_document_embedded PASSED                                                 [ 98%]
test/unittest/model/test_vanilla.py::test_function_predict_without_document_embedded PASSED                                              [ 98%]
test/unittest/model/test_vanilla.py::test_function_predict_with_flatten_outputs PASSED                                                   [ 99%]
test/unittest/model/test_vanilla.py::test_function_predict_with_mix_flatten_outputs SKIPPED (unconditional skip)                         [100%]

And I am fixing them

blythed commented 12 months ago

@thejumpman2323 to sync with @jieguangzhou on how to reduce number of fixtures.

jieguangzhou commented 12 months ago

Implementation ideas for fixtures

Use scope parameter to control the scope of fixtures

for example

@pytest.fixture(scope='session')
def empty() -> Iterator[Datalayer]:
    from superduperdb import CFG
    from superduperdb.base.build import build_datalayer

    db = build_datalayer(CFG, data_backend='mongomock:///test_db')
    db.databackend.conn.is_mongos
    yield db
    db.databackend.conn.close()

scope = ["function" , "class", "module" or "session".]

session means one single fixture

Build multiple collection fixtures used by test case

For example

@pytest.fixture
def empty_collection() -> Collection:
    return Collection(str(uuid.uuid4()))

@pytest.fixture
def collection_with_random_data(datalayer) -> Collection:
    collection_name = 'random_data'
    add_random_data(empty, collection_name)
    return Collection(collection_name)

@pytest.fixture
def collection_with_model(empty) -> Collection:
    collection_name = 'model_data'
    add_model(empty, collection_name)
    return Collection(collection_name)

@pytest.fixture
def collection_with_vector_index(empty) -> Collection:
    collection_name = 'vector_index''
    add_vector_index(empty, collection_name)
    return Collection(collection_name)

@pytest.fixture
def collection_with_other_env(empty) -> Collection:
    collection_name = 'other_env'
    add_random_data(empty, collection_name)
    add_model(empty, collection_name)
    add_vector_index(empty, collection_name)
    return Collection(collection_name)

we also can use scope to control the scope of fixtures

When multiple test cases conflict with a collection, such as when one of them deletes data, we can create different actions to solve it.

@pytest.fixture(scope='session')
def collection_with_other_env_global(empty) -> Collection:
    collection_name = 'other_env'
    add_random_data(empty, collection_name)
    add_model(empty, collection_name)
    add_vector_index(empty, collection_name)
    return Collection(collection_name)

@pytest.fixture
def collection_with_other_env(empty) -> Collection:
    collection_name = 'other_env'
    add_random_data(empty, collection_name)
    add_model(empty, collection_name)
    add_vector_index(empty, collection_name)
    return Collection(collection_name)

Use the fixtures

def test_execute_insert_and_find(empty, empty_collection):
    from superduperdb.base.document import Document

    collection = empty_collection()
    collection.insert_many([Document({'this': 'is a test'})]).execute(empty)
    r = collection.find_one().execute(empty)
    print(r)

Run on MongoDB and sqlite

We could probably use pytest.mark.parametrize to run a single test case on both MongoDB and sqlite, if that's something we need.

This is the basic idea, what do you think? @blythed @thejumpman2323

thejumpman2323 commented 12 months ago

We should use


def _super_db(empty_db):
    collection_name = 'other_env'
    add_random_data(empty_db, collection_name)
    add_model(empty_db, collection_name)
    add_vector_index(empty_db, collection_name)
    return empty_db

@pytest.fixture(scope='session')
def global_superdb(empty) -> Datalayer:
    return _super_db(empty)

@pytest.fixture
def local_superdb(empty) -> Datalayer:
    return _super_db(empty)

@jieguangzhou

thejumpman2323 commented 12 months ago

@jieguangzhou also, following is what I think we should do in UT,

1 stage - refactor existing fixtures 2 stage - refactor existing unit test cases 3 stage - extend existing unit test cases

NOTE: In stage 2 - we should follow a standard structure across the unit test case classes or module files.

for example just an idea: An example structure could be,

def test_smoke()
     ....

def test_basic1()
    ....
.
.
def test_basicN()
    ....

def test_advance1()
    ....
.
.

def test_advanceN()
    ....

NOTE2: Try to use parameterised test as much as possible, This will increase coverage and reduce test code

@blythed @jieguangzhou

jieguangzhou commented 11 months ago