Closed Kevinlee49 closed 6 months ago
Hi, please provide the tool name if you are using our built-in tool, seems semantic_search and hybrid_search are the names of your flow node. (BTW, the github issue mostly target for the open source promptflow version, if you are using promptflow inside Azure Machine Learning workspace, open OCV (as below) is highly recommended)
And the built-in tool list you could find at 'More tools':
@brynn-code Hello, thanks for reply. oh yes, I didn't clarify the tool. These two nodes are index lookup tools. And as your recommendation, I left feedback in OCV.
@dans-msft and @Adarsh-Ramanathan , could you please help on the index lookup tool issue? Thanks!
@Kevinlee49, can you share a minimal flow yaml that results in a repro so we can look into this?
@Adarsh-Ramanathan Here it is,
environment:
python_requirements_txt: requirements.txt
inputs:
question:
type: string
is_chat_input: false
default: blah blah blah
outputs:
output:
type: string
reference: ${answer_the_question_with_context.output}
evaluation_only: false
is_chat_output: true
nodes:
- name: answer_the_question_with_context
type: llm
source:
type: code
path: answer_the_question_with_context.jinja2
inputs:
deployment_name: gpt-35-turbo-16k-v0613
temperature: 0.7
top_p: 1
max_tokens: 1000
response_format:
type: text
presence_penalty: 0
frequency_penalty: 0
final_prompt: ${final_prompt.output}
provider: AzureOpenAI
connection: test-connection
api: chat
module: promptflow.tools.aoai
aggregation: false
use_variants: false
- name: input_classify_and_rephrase
type: prompt
source:
type: code
path: input_classify_and_rephrase.jinja2
inputs:
question: ${inputs.question}
use_variants: false
- name: semantic_search
type: python
source:
type: package
tool: promptflow_vectordb.tool.common_index_lookup.search
inputs:
mlindex_content: >
embeddings:
api_base: https://
api_type: azure
api_version: 2023-07-01-preview
batch_size: '16'
connection:
id: /subscriptions/
connection_type: workspace_connection
deployment: text-embedding-ada-002
dimension: 1536
file_format_version: '2'
kind: open_ai
model: text-embedding-ada-002
schema_version: '2'
index:
api_version: 2023-07-01-preview
connection:
id: /subscriptions/
connection_type: workspace_connection
endpoint: https://
engine: azure-sdk
field_mapping:
content: content
embedding: contentVector
filename: filepath
metadata: meta_json_string
title: title
url: url
index: pf-rpindex
kind: acs
semantic_configuration_name: azureml-default
queries: ${python_query_analyze_and_rephrase.output}
query_type: Semantic
top_k: 3
use_variants: false
- name: hybrid_search
type: python
source:
type: package
tool: promptflow_vectordb.tool.common_index_lookup.search
inputs:
mlindex_content: >
embeddings:
api_base: https://
api_type: azure
api_version: 2023-07-01-preview
batch_size: '16'
connection:
id: /subscriptions/
connection_type: workspace_connection
deployment: text-embedding-ada-002
dimension: 1536
file_format_version: '2'
kind: open_ai
model: text-embedding-ada-002
schema_version: '2'
index:
api_version: 2023-07-01-preview
connection:
id: /subscriptions/
connection_type: workspace_connection
endpoint: https://
engine: azure-sdk
field_mapping:
content: content
embedding: contentVector
filename: filepath
metadata: meta_json_string
title: title
url: url
index: pf-rpindex
kind: acs
semantic_configuration_name: azureml-default
queries: ${input_classify_and_rephrase.output}
query_type: Hybrid (vector + keyword)
top_k: 3
use_variants: false
- name: generate_context
type: python
source:
type: code
path: generate_context.py
inputs:
hybrid_search_output: ${hybrid_search.output}
semantic_search_output: ${semantic_search.output}
use_variants: false
- name: final_prompt
type: prompt
source:
type: code
path: final_prompt.jinja2
inputs:
context: ${generate_context.output}
question: ${inputs.question}
use_variants: false
- name: python_query_analyze_and_rephrase
type: python
source:
type: code
path: python_query_analyze_and_rephrase.py
inputs:
question: ${inputs.question}
use_variants: false
@Adarsh-Ramanathan Hello, did you get the solution for this issue?
@Kevinlee49 , not yet. I'll post an update once I've investigated.
@Kevinlee49 , I'm assuming you have actual URLs etc in the various mlindex fields (like api_base
and similar), and you've just redacted them before posting on here. I'm having a hard time reproing this issue; can you capture the outputs of the python_query_analyze_and_rephrase
and input_classify_and_rephrase
steps when executing your flow?
@Adarsh-Ramanathan Of course, I have my actual urls, etc in my fields. Just the only thing is I realized that only in hybrid search part is problematic. It kept saying that I need vector fields.
outputs of the python_query_analyze_and_rephrase and input_classify_and_rephrase are respectively [topic_keyword] [question], like weather [how's the weather today?] , for input_classify_and_rephrase output is just prompt. Because it is a jinja2 file.
@Adarsh-Ramanathan or do you think it is because I am using index lookup twice in one step? semantic_search and hybrid_search nodes are only different in queries(one is from python output and the other is from jinja output) and query type which are semantic and hybrid(keyword+vector).
Run failed: Execution failure in 'hybrid_search': (HttpResponseError) (InvalidRequestParameter) At least one vector field needs to be selected explicitly using the 'vector.fields' parameter. Parameter name: vector.fields Code: InvalidRequestParameter Message: At least one vector field needs to be selected explicitly using the 'vector.fields' parameter. Parameter name: vector.fields Exception Details: (InvalidVectorQuery) At least one vector field needs to be selected explicitly using the 'vector.fields' parameter. Code: InvalidVectorQuery Message: At least one vector field needs to be selected explicitly using the 'vector.fields' parameter. hybrid_search : Execution failure in 'hybrid_search': (HttpResponseError) (InvalidRequestParameter) At least one vector field needs to be selected explicitly using the 'vector.fields' parameter. Parameter name: vector.fields Code: InvalidRequestParameter Message: At least one vector field needs to be selected explicitly using the 'vector.fields' parameter. Parameter name: vector.fields Exception Details: (InvalidVectorQuery) At least one vector field needs to be selected explicitly using the 'vector.fields' parameter. Code: InvalidVectorQuery Message: At least one vector field needs to be selected explicitly using the 'vector.fields' parameter.
Can you share a run's example output for input_classify_and_rephrase
? I'm not able to repro this issue, our best bet is to eliminate variables and isolate the problem down to the actual lookup nodes, and run it with a configuration that's as close to the one you're running, inputs and all.
Can you also provide info about the runtime version you're using, and if you've installed/updated/overriden any packages? A pip freeze
dump would be useful.
To answer your question, no - introducing multiple instances of index lookup is definitely supported - the reason hybrid is failing and semantic is not is that ACS doesn't need a vector input for semantic search, while hybrid does. The issue is that your MLIndex looks like it's configured correctly, so we should have been able to produce a vector to send to ACS - this is what we need to get to the bottom of.
@Adarsh-Ramanathan On VScode the errors look like this
@Adarsh-Ramanathan
_Can you share a run's example output for input_classify_andrephrase? ->
assistant: ~~~ system: ~~~~ conversation: ~~~ user: ~~~
Can you also provide info about the runtime version you're using, and if you've installed/updated/overriden any packages? -> how can I show this? .
assistant: ~~~
system: ~~~~
conversation: ~~~
user: ~~~
Is this a string? A list of strings? An object?
Can you also provide info about the runtime version you're using, and if you've installed/updated/overriden any packages? -> how can I show this?
You could add a python step with these contents to your flow, and grab it's stdout:
@tool
def my_python_tool(input1: str) -> str:
from pip._internal.operations import freeze
pkgs = freeze.freeze()
for pkg in pkgs: print(pkg)
@Adarsh-Ramanathan _
Is this a string? A list of strings? An object?
_ -> it's a string
Can you also provide info about the runtime version you're using, and if you've installed/updated/overriden any packages?
@Kevinlee49 , I'm still unable to reproduce your error.
Here's the minimal flow I built off of your example:
Flow yaml:
inputs:
question:
type: string
default: blah blah blah
is_chat_input: false
outputs:
output:
type: string
reference: ${generate_context.output}
evaluation_only: false
is_chat_output: true
nodes:
- name: input_classify_and_rephrase
type: prompt
source:
type: code
path: input_classify_and_rephrase.jinja2
inputs:
question: ${inputs.question}
use_variants: false
- name: hybrid_search
type: python
source:
type: package
tool: promptflow_vectordb.tool.common_index_lookup.search
inputs:
mlindex_content: >
embeddings:
api_base: ****
api_type: azure
api_version: 2023-07-01-preview
batch_size: '16'
connection:
id: ****
connection_type: workspace_connection
deployment: text-embedding-ada-002
dimension: 1536
file_format_version: '2'
kind: open_ai
model: text-embedding-ada-002
schema_version: '2'
index:
api_version: 2023-07-01-preview
connection:
id: ****
connection_type: workspace_connection
endpoint: ****
engine: azure-sdk
field_mapping:
content: content
embedding: contentVector
filename: filepath
metadata: meta_json_string
title: title
url: url
index: ****
kind: acs
semantic_configuration_name: azureml-default
queries: ${input_classify_and_rephrase.output}
query_type: Hybrid (vector + keyword)
top_k: 3
use_variants: false
- name: generate_context
type: python
source:
type: code
path: generate_context.py
inputs:
search_result: ${hybrid_search.output}
use_variants: false
node_variants: {}
environment:
python_requirements_txt: requirements.txt
requirements.txt:
promptflow_vectordb[azure]
generate_context.py:
from typing import List
from promptflow import tool
import json
from pip._internal.operations import freeze
@tool
def generate_prompt_context(search_result: List[dict]) -> str:
return json.dumps(list(freeze.freeze()))
input_classify_and_rephrase.jinja2:
system: You are a helpful bot that finds answers to questions.
user: {{ question }}
assistant:
I'm running this with an automatic runtime in westus2. Can you try running this flow and see if your issue still persists?
Your issue in vscode is unrelated, IIRC, you need to configure a number of azure defaults beforehand to get things to play nice: https://microsoft.github.io/promptflow/how-to-guides/develop-a-tool/create-dynamic-list-tool-input.html#faqs
@Adarsh-Ramanathan I tested it already. It's working when I use hybrid search alone, but when I use 2 index lookup nodes together, it's not working. Can you try to make one more node for semantic and test it again?
@Adarsh-Ramanathan And it's fun to see that when the error occurred if I just run each node individually, then it's working until the end of flow. But if I just clicked the run button for the whole flow, it said error in index lookup nodes. It worked 2 search nodes together several times (like 4 times in a row) but after 4th, it started to not work again. So, sometimes it works and most of the attempts cause errors.
If you don't mind, can we set up a quick call or meeting? I want to show in person this error.
Alright, I finally managed to get a repro going! The key was to have more than one lookup node, and to run the flow several times, since it doesn't repro deterministically!
I'll investigate further and post updates as I have them.
If you have additional info you want to share over a call, then sure, feel free to set up some time.
@Adarsh-Ramanathan That's correct. I see, I just left feedback requests a few weeks ago on azure ml studio promptflow page, but I couldn't get any contact. You're the fastest person who replies my question.
@Kevinlee49, just posting an update. I spent some time investigating yesterday, and I think I understand why this bug occurs; we'll start working on a patch soon. Unfortunately, the real issue is a couple of links down the dependency chain, but I think we can work around it in the tools package.
If you'd like to test a release candidate to help verify (when we have one, that is), please reach out.
@Adarsh-Ramanathan Sure, thank you! I will look forward to your message and new update!
@Kevinlee49, I have a candidate runtime image for you to test. I've run it several times through the flow I used to repro, and as far as I can tell, the issue is fixed. Would you be willing to test on your flow to confirm?
You can pull the image from adramapfdev.azurecr.io/promptflow-runtime:20240314
.
You'll need to a) pull this image and re-push it to your workspace ACR, b) create a custom environment in your workspace with the image from your workspace ACR and an empty conda file, and c) update your CI runtime, choose the custom environment option, and pick the environment you created in (b).
Issue #2026 pertains to the same question as this issue.
@Kevinlee49, I have a candidate runtime image for you to test. I've run it several times through the flow I used to repro, and as far as I can tell, the issue is fixed. Would you be willing to test on your flow to confirm?
You can pull the image from
adramapfdev.azurecr.io/promptflow-runtime:20240314
.You'll need to a) pull this image and re-push it to your workspace ACR, b) create a custom environment in your workspace with the image from your workspace ACR and an empty conda file, and c) update your CI runtime, choose the custom environment option, and pick the environment you created in (b).
@Adarsh-Ramanathan thank you, I will try it today! @Adarsh-Ramanathan I was trying to create a new runtime as following your instruction, but I got this error. Do you know how to handle this?
Runtime pf-test2-runtime create failed. FlowRuntime pf-test2-runtime in compute instance mylee-compute2 is not ready: runtime starting timeout. Please try to create a new compute instance to hold runtime
Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!
I'm using hybrid search and semantic search simultaneously and got this error. Do you happen to know what is the solution for this?
This is the output from semantic search node.
This is the output from hybrid search node.