opensearch-project / skills-eval

Eval framework for evaluating quality and performance of skills for ml-common's agent framework
Apache License 2.0
1 stars 7 forks source link

Improve `SearchAnomalyDetectorsTool` test suite #18

Closed ohltyler closed 6 months ago

ohltyler commented 6 months ago

Description

Now that the dependencies on agent configuration and this repo have become more stable, this PR improves the SearchAnomalyDetectorsTool test suite in a few ways:

Below is the test results from running on an agent configured exclusively with SearchAnomalyDetectorsTool. NOTE: this is from manually changing the matcher with subset set to 1 instead of 0 - see here. Need further direction in what to tune here- perhaps the expected responses should be more verbose and provide more details.

npm test --tests src/tests/anomaly-detection/search_anomaly_detectors.test.ts

> skills-eval@0.0.1 test
> NODE_TLS_REJECT_UNAUTHORIZED=0 jest --config ./configs/jest.config.ts --runInBand src/tests/anomaly-detection/search_anomaly_detectors.test.ts

deleted index group: anomaly-detection
created index .opendistro-anomaly-checkpoints
created index .opendistro-anomaly-detector-jobs
created index .opendistro-anomaly-detection-state
created index .opendistro-anomaly-detectors
created index test-index
created index .opendistro-anomaly-results-1-1-1
Running test: id-1
Received: Based on the output of the SearchAnomalyDetectorsTool, there are 4 anomaly detectors configured.
Expected: You have 4 detectors
Score: 1

Running test: id-2
Received: Based on running the SearchAnomalyDetectorsTool with the input of {'running': 'true'}, the tool response showed there are currently 0 running detectors.
Expected: You have 0 running detectors
Score: 1

Running test: id-3
Received: You have 0 failed anomaly detectors based on the results of running the SearchAnomalyDetectorsTool with an input to only return failed detectors.
Expected: You have 0 failed detectors
Score: 1

Running test: id-4
Received: Based on the output of the SearchAnomalyDetectorsTool, the anomaly detector test-detector-1 is detecting anomalies over the index called 'test-index'.
Expected: The detector test-detector-1 is detecting over the index test-index
Score: 1

Running test: id-5
Received: The description for anomaly detector test-detector-1 is 'A description for test-detector-1' as returned by the SearchAnomalyDetectorsTool.
Expected: The description for test-detector-1 is: A description for test-detector-1
Score: 1

Running test: id-6
Received: Based on the output of the SearchAnomalyDetectorsTool, test-detector-2 is detecting anomalies over the test-index.
Expected: The detector test-detector-2 is detecting over the index test-index
Score: 1

Running test: id-7
Received: The description for anomaly detector test-detector-2 is 'A description for test-detector-2' as returned by the SearchAnomalyDetectorsTool.
Expected: The description for test-detector-2 is: A description for test-detector-2
Score: 1

Running test: id-8
Received: The SearchAnomalyDetectorsTool found 4 anomaly detectors named test-detector-1, test-detector-2, test-detector-3, and test-detector-4 that are currently not running based on the 'running': 'false' filter.
Expected: Yes, you have 4 non-running detectors: test-detector-1, test-detector-2, test-detector-3, and test-detector-4
Score: 1

Running test: id-9
Received: Yes, you have one high-cardinality anomaly detector configured named 'test-detector-4' that monitors the 'test-index' index.
Expected: Yes, you have one high-cardinality detector: test-detector-4
Score: 1

Running test: id-10
Received: The total number of high cardinality detectors is 1 based on the output of the SearchAnomalyDetectorsTool.
Expected: You have one total high-cardinality detector: test-detector-4
Score: 1

Running test: id-11
Received: The anomaly detector test-detector-1 was last updated on January 1, 1970 at 12:34:00 AM UTC based on its lastUpdateTime of 1234 from the SearchAnomalyDetectorsTool output.
Expected: test-detector-1 was last updated on 1234
Score: 1

Running test: id-12
Received: The type of detector test-detector-1 is SINGLE_ENTITY based on the output of the SearchAnomalyDetectorsTool. The SINGLE_ENTITY type indicates it is a single-entity anomaly detector.
Expected: test-detector-1 is a single entity detector.
Score: 1

Running test: id-13
Received: Based on searching the anomaly detectors configured using the SearchAnomalyDetectorsTool, there are 4 detectors configured to detect anomalies over the 'test-index' index - test-detector-1, test-detector-2, test-detector-3 and test-detector-4.
Expected: Yes, you have 4 detectors detecting over the index test-index.
Score: 1

Running test: id-14
Received: Based on searching the anomaly detectors tool, there are no detectors configured to detect anomalies over the 'foo-index' index, as the tool returned an empty list and a total count of 0 detectors.
Expected: No, there are no detectors detecting over that index.
Score: 1

Running test: id-15
Received: The name of your high-cardinality anomaly detector is test-detector-4, which detects anomalies across multiple entities in the test-index.
Expected: You have one high-cardinality detector named test-detector-4
Score: 1

Running test: id-16
Received: The SearchAnomalyDetectorsTool did not return any high-cardinality failed anomaly detectors. To answer the user's question, there are no high-cardinality failed detectors.
Expected: No, you have no high-cardinality detectors in a failed state
Score: 1

Running test: id-17
Received: Yes, based on the output of the SearchAnomalyDetectorsTool, there are 4 detectors whose names start with 'test-': test-detector-1, test-detector-2, test-detector-3, and test-detector-4.
Expected: You have 4 matching detectors: test-detector-1, test-detector-2, test-detector-3, and test-detector-4
Score: 1

Running test: id-18
Received: Based on running the SearchAnomalyDetectorsTool with a name pattern of 'foo*', the tool response indicated there are 0 detectors matching that name pattern.
Expected: You have no matching detectors that start with that.
Score: 1

Running test: id-19
Received: The anomaly detector test-detector-4 is a multi-entity detector that detects anomalies across multiple entities in the test-index.
Expected: test-detector-4 is a multi-entity detector
Score: 1

Running test: id-20
Received: Yes, the tool response shows there is one non-running high-cardinality anomaly detector called test-detector-4 detecting anomalies on the test-index.
Expected: Yes, you have one high-cardinality detector in a non-running state: test-detector-4-id
Score: 1

Summary: 20 tests, average score: 1.00, range: 1.00 - 1.00. Pass rate: 100%, average execution time: 4785.00 ms
 PASS  src/tests/anomaly-detection/search_anomaly_detectors.test.ts (106.223 s)
  Cluster state anomaly-detection
    ✓ Test-id id-1 (4415 ms)
    ✓ Test-id id-2 (6178 ms)
    ✓ Test-id id-3 (4923 ms)
    ✓ Test-id id-4 (5004 ms)
    ✓ Test-id id-5 (4570 ms)
    ✓ Test-id id-6 (4583 ms)
    ✓ Test-id id-7 (5971 ms)
    ✓ Test-id id-8 (5249 ms)
    ✓ Test-id id-9 (5376 ms)
    ✓ Test-id id-10 (6415 ms)
    ✓ Test-id id-11 (4377 ms)
    ✓ Test-id id-12 (4825 ms)
    ✓ Test-id id-13 (5730 ms)
    ✓ Test-id id-14 (5159 ms)
    ✓ Test-id id-15 (4633 ms)
    ✓ Test-id id-16 (4744 ms)
    ✓ Test-id id-17 (5561 ms)
    ✓ Test-id id-18 (5521 ms)
    ✓ Test-id id-19 (5602 ms)
    ✓ Test-id id-20 (5134 ms)

📦 report is created on: <>
Test Suites: 1 passed, 1 total
Tests:       20 passed, 20 total
Snapshots:   0 total
Time:        106.278 s, estimated 135 s

Workflow template configuration: (after provisioning, pulled out IDs to my local .env file)

{
  "name": "search-ad-flow",
  "description": "Flow template",
  "use_case": "REGISTER_AGENT",
  "version": {
    "template": "1.0.0",
    "compatibility": [
      "2.12.0",
      "3.0.0"
    ]
  },
  "workflows": {
    "provision": {
      "user_params": {},
      "nodes": [
        {
          "id": "create_connector_1",
          "type": "create_connector",
          "previous_node_inputs": {},
          "user_inputs": {
            "version": "1",
            "name": "Claude instant runtime Connector",
            "protocol": "aws_sigv4",
            "description": "The connector to BedRock service for claude model",
            "actions": [
              {
                "headers": {
                  "x-amz-content-sha256": "required",
                  "content-type": "application/json"
                },
                "method": "POST",
                "request_body": "{\"prompt\":\"${parameters.prompt}\", \"max_tokens_to_sample\":${parameters.max_tokens_to_sample}, \"temperature\":${parameters.temperature},  \"anthropic_version\":\"${parameters.anthropic_version}\" }",
                "action_type": "predict",
                "url": "https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-instant-v1/invoke"
              }
            ],
            "credential": {
              "access_key": <redacted>
              "secret_key": <redacted>
            },
            "parameters": {
              "endpoint": "bedrock-runtime.us-west-2.amazonaws.com",
              "content_type": "application/json",
              "auth": "Sig_V4",
              "max_tokens_to_sample": "8000",
              "service_name": "bedrock",
              "temperature": "0.0001",
              "response_filter": "$.completion",
              "region": "us-west-2",
              "anthropic_version": "bedrock-2023-05-31"
            }
          }
        },
        {
          "id": "register_model_2",
          "type": "register_remote_model",
          "previous_node_inputs": {
            "create_connector_1": "connector_id"
          },
          "user_inputs": {
            "description": "test model",
            "deploy": true,
            "name": "claude-instant"
          }
        },
        {
          "id": "search_alerts_tool",
          "type": "create_tool",
          "user_inputs": {
            "type": "SearchAlertsTool",
            "name": "SearchAlertsTool",
            "parameters": {}
          }
        },
        {
          "id": "search_monitors_tool",
          "type": "create_tool",
          "user_inputs": {
            "type": "SearchMonitorsTool",
            "name": "SearchMonitorsTool",
            "parameters": {}
          }
        },
        {
          "id": "search_anomaly_detectors_tool",
          "type": "create_tool",
          "user_inputs": {
            "type": "SearchAnomalyDetectorsTool",
            "name": "SearchAnomalyDetectorsTool",
            "parameters": {}
          }
        },
        {
          "id": "search_anomaly_results_tool",
          "type": "create_tool",
          "user_inputs": {
            "type": "SearchAnomalyResultsTool",
            "name": "SearchAnomalyResultsTool",
            "parameters": {}
          }
        },
        {
          "id": "sub_agent",
          "type": "register_agent",
          "previous_node_inputs": {
          // uncomment any / all of the below to provision
          // an agent that uses any subset of the AD & alerting tools
            "search_alerts_tool": "tools",
            "search_monitors_tool": "tools",
            "search_anomaly_detectors_tool": "tools",
            "search_anomaly_results_tool": "tools",
            "register_model_2": "model_id"
          },
          "user_inputs": {
            "parameters": {},
            "app_type": "chatbot",
            "name": "Sub Agent",
            "description": "this is a test agent",
            "llm.parameters": {
              "max_iteration": "5",
              "stop_when_no_tool_found": "true",
              "response_filter": "$.completion"
            },
            "memory": {
              "type": "conversation_index"
            },
            "type": "conversational"
          }
        },
        {
          "id": "agent_tool",
          "type": "create_tool",
          "previous_node_inputs": {
            "sub_agent": "agent_id"
          },
          "user_inputs": {
            "description": "Agent Tool",
            "include_output_in_agent_response": true,
            "type": "AgentTool",
            "parameters": {
              "max_iteration": "5"
            },
            "name": "AgentTool"
          }
        },
        {
          "id": "ml_model_tool",
          "type": "create_tool",
          "previous_node_inputs": {
            "register_model_2": "model_id"
          },
          "user_inputs": {
            "parameters": {
              "prompt": "\n\nHuman:\" turn\" You are an AI that only speaks JSON. Do not write normal text. Output should follow example JSON format: \n\n {\"response\": [\"question1\", \"question2\"]}\n\n. \n\nHuman:\" turn\":You will be given a chat history between OpenSearch Assistant and a Human.\nUse the context provided to generate follow up questions the Human would ask to the Assistant.\nThe Assistant can answer general questions about logs, traces and metrics.\nAssistant can access a set of tools listed below to answer questions given by the Human:\nQuestion suggestions generator tool\nHere's the chat history between the human and the Assistant.\n${parameters.AgentTool.output}\nUse the following steps to generate follow up questions Human may ask after the response of the Assistant:\nStep 1. Use the chat history to understand what human is trying to search and explore.\nStep 2. Understand what capabilities the assistant has with the set of tools it has access to.\nStep 3. Use the above context and generate follow up questions.Step4:You are an AI that only speaks JSON. Do not write normal text. Output should follow example JSON format: \n\n {\"response\": [\"question1\", \"question2\"]} \n \n----------------\n\nAssistant:"
            },
            "description": "A general tool to answer any question.",
            "alias": "language_model_tool",
            "include_output_in_agent_response": true,
            "name": "QuestionSuggestor",
            "type": "MLModelTool"
          }
        },
        {
          "id": "root_agent",
          "type": "register_agent",
          "previous_node_inputs": {
            "agent_tool": "tools",
            "register_model_2": "model_id",
            "ml_model_tool": "tools"
          },
          "user_inputs": {
            "parameters": {
              "prompt": "Answer the question as best you can."
            },
            "app_type": "chatbot",
            "name": "Root agent - search AD tool",
            "description": "this is the root agent",
            "tools_order": [
              "agent_tool",
              "ml_model_tool"
            ],
            "memory": {
              "type": "conversation_index"
            },
            "type": "flow"
          }
        }
      ],
      "edges": []
    }
  }
}

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.