microsoft / promptflow

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
https://microsoft.github.io/promptflow/
MIT License
9.22k stars 835 forks source link

[BUG] Promptflow-evals raises '(TypeError) cannot pickle '_thread.RLock' object.' #3672

Open Wix97 opened 1 month ago

Wix97 commented 1 month ago

PF: 1.14.0 promptflow-evals: 0.3.1


from promptflow.evals.synthetic import AdversarialScenario, AdversarialSimulator
from promptflow.evals.evaluate import evaluate
from promptflow.evals.evaluators import (
    ContentSafetyEvaluator,
)
from typing import Dict, Any
from azure.identity import DefaultAzureCredential
from pathlib import Path
import pandas as pd
import json

ai_studio_resource = "<Your resource ID here>"

def parse_resource_id(resource_id: str) -> dict[str, str]:
    resource_id_parts = resource_id.strip("/").split("/")
    return {
        str(resource_id_parts[i]).lower(): resource_id_parts[i + 1]
        for i in range(0, len(resource_id_parts), 2)
    }

ai_studio_parsed_id = parse_resource_id(ai_studio_resource)

azure_ai_project = {
    "subscription_id": ai_studio_parsed_id["subscriptions"],
    "resource_group_name": ai_studio_parsed_id["resourcegroups"],
    "project_name": ai_studio_parsed_id["workspaces"],
    "credential": DefaultAzureCredential(),
}

async def callback(
    messages: Dict,
    stream: bool = False,
    session_state: Any = None,
) -> dict:
    query = messages["messages"][0]["content"]

    response = "I don't know"

    messages["messages"].append(
        {
            "content": response,
            "role": "assistant",
        }
    )

    return {
        "messages": messages["messages"],
        "stream": stream,
        "session_state": session_state,
    }

simulator = AdversarialSimulator(azure_ai_project=azure_ai_project)
outputs = await simulator(
    scenario=AdversarialScenario.ADVERSARIAL_QA,  # required adversarial scenario to simulate
    target=callback,
    max_simulation_results=1,  # optional
    jailbreak=True,  # optional
    concurrent_async_task=4,
    max_conversation_turns=1,
)

df_outputs = pd.DataFrame(
    [json.loads(x) for x in outputs.to_eval_qa_json_lines().splitlines()]
)

tmp_path = Path("test.jsonl")

df_outputs.to_json(tmp_path, orient="records", lines=True)

result = evaluate(
    data=str(tmp_path),
    evaluators={
        "cs": ContentSafetyEvaluator(
            project_scope=azure_ai_project,
            credential=DefaultAzureCredential(),
            parallel=True,
        ),
    },
    evaluator_config={
        "cs": {"answer": "${data.answer}", "question": "${data.question}"},
    },
)

The code raises UnexpectedError: Unexpected error occurred while executing the batch run. Error: (TypeError) cannot pickle '_thread.RLock' object. in a jupyter notebook.

github-actions[bot] commented 1 day ago

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

Wix97 commented 19 hours ago

Issue is not resolved yet, still tracking